And the worst data migration practices oscar goes to..
Siebel! Yes, the CRM software Oracle owns for some time;
http://www.oracle.com/applications/crm/siebel/index.html
I think this application is one of the best example of Kyte’s “database independent” black box applications, which usually means designed to work on every database platform but born to perform and scale bad on all of them. The reason is simple, it do not use any feature of your already purchased database software like partitioning, parallel query or even constraints.
Oh but Siebel loves indexes, yes believe it or not they indexed nearly each column of their heacy OLTP write tables, so I think they earned this oscar properly :)
Last week my help was requested for a Siebel data migration. After several minutes of monitoring I thought of leaving the building immediately. The migration utility inside Siebel did not allow you to define hints, like APPEND, did not support parallel query option, indexes on huge tables were not allowed to be dropped, so huge UPDATEs and INSERTs produced enormous amount of redo etc. And the root cause conclusion was of course easy, I/O capability of the system was bad :)
Only thing I was able to help was taking the availability risk and keeping target database on NOARHIVELOG mode, creating huge(10 GB each) single membered redo log files, re-collecting appropriate optimizer statistics on source and increasing buffer cache on both ends up to the physical limits of the machines. Before these actions group was unhappy with 500,000 rows per hour processing and afterwards it was 3,000,000 rows per hour and everybody was thankful. But I believe this 20 days and night data migration could finish within hours if Siebel was only an Oracle database friendly application, meaning if it allowed dropping indexes before migration starts and rebuilding after migration ends, allowing CREATE TABLE as SELECT in PARALLEL and NOLOGGING instead of these huge UPDATEs and INSERTs. I experienced TBs of data moving around within hours with these simple data migration best practices.
So let’s hope maybe some day Oracle will force the tools it buys to have an Oracle friendly version also and lets its customers to utilize their purchases.
ps: if you want to continue reading on data migration experiences of mine please check these out;
Direct Path Load Operations versus Integrity Constraints, Triggers and Instance Recovery
http://tinyurl.com/6xwxyt
I am back :)
http://tinyurl.com/6gk2wo
Some Best Practices of Upgrading Oracle Database
http://tinyurl.com/5ds6hv
Data Guard Redo Apply and Media Recovery Best Practices 10g
This morning I was requested to check a 10gR2 physical standby instance which was slow to apply redo and the lag was growing rapidly, my initial check showed me 9 to 12 minutes duration for 250 MB redo log file to apply;
Mon May 5 06:03:17 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902209.arc Mon May 5 06:13:20 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902210.arc Mon May 5 06:23:07 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902211.arc Mon May 5 06:32:51 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902212.arc Mon May 5 06:42:40 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902213.arc Mon May 5 06:52:01 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902214.arc Mon May 5 07:00:52 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902215.arc Mon May 5 07:09:58 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902216.arc ..
After a quick look from unix SQL*Plus as sysdba the Top 5 waits were;
PX Deq: Test for msg (4,5x amount) free buffer waits log file sequential read PX Deq Credit: send blkd checkpoint completed
And after going through the Note:387343.1 and Data Guard Redo Apply and Media Recovery Best Practices 10gR2 I recommended to increase the buffer cache from 512 MB to 6 GB, reduce the recovery parallelism from 20 to 8 and also some additional parameters like;
*.db_cache_size=6G *.log_buffer=78643200 *.large_pool_size=2G *.pga_aggregate_target=10G *.workarea_size_policy='AUTO' *.parallel_min_servers=360 *.parallel_max_servers=520 *.parallel_threads_per_cpu=2 *.optimizer_mode=ALL_ROWS *.optimizer_dynamic_sampling=4 *.cursor_sharing=EXACT RECOVER MANAGED STANDBY DATABASE PARALLEL 8;
After the restart as I monitored the apply process from the alert.log again, now the database was able to apply 250 MB redo log file within 30 seconds and 1 minutes;
Mon May 5 09:01:07 2008 ALTER DATABASE RECOVER automatic from '/assos_arcdir' standby database PARALLEL 8 Mon May 5 09:01:07 2008 Media Recovery Start Managed Standby Recovery not using Real Time Apply parallel recovery started with 8 processes Mon May 5 09:01:52 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902217.arc Mon May 5 09:02:29 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902218.arc Mon May 5 09:03:41 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902219.arc Mon May 5 09:04:39 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902220.arc Mon May 5 09:05:33 2008 Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902221.arc ..
And the Top 5 waits were now;
PX Deq: Test for msg (x amount) log file sequential read checkpoint completed db file parallel write latch: checkpoint queue latch
So as my lessons learned, I really want to highlight two things, yes again and again these two;
1- If you have enough physical memory, benefit from it to the ends, do not run your production databases with small SGA(critical for OLTP) or PGA(critical for OLAP) especially after 9iR2(nearly no latching problem like the old days with large SGAs). Come on even on laptops we use some GBs of sized SGAs nowadays.
2- Parallelism does not mean faster, I benefit from reducing the degree of parallelism most often than compared to increasing it, 4 or 8 is nearly anytime is enough for my needs. I hope Oracle invests more in automatic parallelism tuning feature in the near future which is FALSE by default on 10gR2 and like automatic undo or automatic pga management also automatic parallel tuning takes its place in my most liked automatic tuning Oracle features :)
Here are some of the sql scripts I used during above troubleshooting from sql*plus, like in above standby case sometimes we only have sql*plus and some simple v$ queries to gather required troubleshooting information.
Event 10046 as a vitamin
This event is not only useful for tuning purposes, whenever I experience a weird behavior of the database engine usually also Oracle support requests the trace file produced with this event since the content shows any database calls made inside the traced application.
Today one of my colleagues asked me if he can unwrap a wrapped pl/sql application. I asked him if he really needs to unwrap it or what he is after may be the SQL calls inside that wrapped application. And showed him the tkprof report of the event 10046 trace of a sample wrapped application, he loved it since it was a kind of a reverse engineering of the pl/sql api they were supplied by a third party company which is not around anymore.
Code Listing - Tracing a Wrapped PL-SQL Code Example
Above tkprof report shows that wrapping a simple pl/sql code like that still adds 57 internal SQL statement calls, 20 hard parses and 208 consistent gets. So before buying a wrapped application and deploying it to your production it may be better to ask yourself twice if you could force more the vendor to get the sources to avoid any performance cost and a possible future reverse engineering effort.
Recently I used 10046 event output to troubleshoot a weird ORA-xxxxx error problem with DBMS_CDC_SUBSCRIBE supplied package and I tried to reverse the database calls of APEX application development environment by starting the trace from a database login trace trigger. After getting used to 10g’s beauties like ASH my needs to use 10046 tracing are getting very limited, but I am sure event 10046 will remain as a powerful tool in my Oracle toolbox for years.
Anyway if you need more information on wrapping a pl/sql application please check Note:341504.1 - FAQ: Wrapping PL/SQL Source Code or you persist to unwrap a wrapped pl/sql please check this presentation :)
A DDL tacking application, if CASCADE option of DBMS_STATS needed everytime, ORA_HASH post update and others
This post will be like a summary of last two weeks of mine. First of all, yesterday we celebrated my company’s 15th year, I feel very lucky to be a part of this success story for my last 8 years :)
Lately I was and will be mostly working in two projects, in parallel;
1- Migrating our billing/crm database from Tru64 to AIX platform; there are several important lessons learned in this cross-platform xx TB Oracle database migration, I will be blogging about these in details later. For now I will be sharing below Oracle-L discussion of mine, a DDL tracking application which was requested to audit and control especially the database objects which are not supported by the change data capture product Goldengate in this post.
- Oracle 10gR2 Multi-Terabyte Database Cross-Platform Migration Method
- Cross-Platform Tablespace Transportation from Oracle documentation
- Oracle 10gR2 Multi-Terabyte Database Cross-Platform Migration a Case Study
You may find the DDL(Data Definition Language, statements used to define the database structure or schema) types in Oracle 10gR2 here. In this small application I used a database trigger and an autonomous transaction based procedure to log the operation, db_name, login_user, object_owner, object_name, object_type, sql_text, attempt_by and attempt_date informations.
A custom DDL tacking application for Oracle database
Also in one of my previous posts I discussed on 10g’s ORA_HASH SQL function as a data validation tool. During a row by row migration(unload on source>ftp text to traget>load texts parallely on target or like in our case insert into select from over dblink plus apply changes captured by a CDC tool on production during this logical data transfer kind of strategies) a validation need becomes very important.
This validation must involve three major parts in my point of view;
1- the metadata validation,
2- the data validation,
3- the performance validation.
I made several updates on the previous post, if you are interested on this topic please follow this link.
2- Design and development of an operational data source in front of our datawarehouse; this environment will differ from datawarehouse in two major areas, first is it will not be denormalized like a typical dwh and be very similar to the operational database’s tables, so transaformation will be very limited compared to a standard ETL. Second from this environment customers want to have near-real time reporting, so Change Data Capture(CDC) kind of data transfer methods are needed.
We will be first implementing an operational data source for our CRM database which runs Siebel application. There are over 60 typical operational near-real time reports need. Oracle Warehouse Builder 11g release has a new feature to integrate a warehouse with Siebel source databases, we will also be testing the performance and capabilities of this new feature.
But as I commented here we will also be working on a logical standby implementation to have the ETL window flexibility in order to be near-real time.
During this project we also had an interesting discussion on DBMS_STAT package’s two options; CASCADE and GRANULARITY. In Choosing An Optimal Stats Gathering Strategy post of Greg Rahn I commented about this discussion. For an 10g Release 2 datawarehouse CASCADE option may be left to FALSE since after each rebuild at the end of a typical daily ETL optimizer already collects statistics. And for the GRANULARITY option to leave it to its default usually will be enough. These two things may help you avoid unnecessary gathering, and reduce total ETL time of course. Below is a small demo for CASCADE option.
Is DBMS_STATS package’s CASCADE option need to be set to TRUE always?
And for the OTHERS part my three highlights are as following;
1- My team Fenerbahçe defeated English Chelsea in Istanbul, but in London we couldn’t make it to the Champions League semi-finals, in my opinion Chelsea isn’t playing good this year and we missed a very important opportunity. My team has young and unexperienced players, but with this much hungry for success for the coming years I am nearly sure we will force the first 4 again and again :)
2- This week at last wordpress domain is back in my country, nearly for a year now we have been playing hacking games.
3- Last words are for the Oracle Critical Patch Update Announcement April 2008 which also includes two nasty vulnerabilities for APEX.
Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0
References Used :
N/A
Alberto Dell’Era’s investigations and scripts on Oracle optimizer
When it comes to Oracle Optimizer Jonathan Lewis, Christian Antognini and Wolfgang Breitling have the best resources I have ever read. And now I will add Alberto Dell’Era to above three - http://www.adellera.it/investigations/index.html
ps: if you still didn’t see this video presentation take a look ;) Optimising through Understanding by Jonathan Lewis
A little more on External Tables
I have written on external tables before;
External table’s performance for a row and undocumented Oracle functions
Unload Data with External Tables and Data Pump
This time last week several questions from our ETL group guided me to test a little more external tables on Solaris 10 and 10g Release 2. Some of the interesting questions were;
a- can we read compressed text data over external tables,
b- can we read data over symbolic linked(ln -s) files with external tables,
c- what is the performance difference of accessing an external table compared to a heap organized and partitioned table, what are the access paths and join methods.
So let me share with you some of my findings parallel to the above questions;
a- Since the extracted files were so huge and needed so much extra disk space to uncompress this was needed. Unix pipes are like magic, for example you can export to a unix pipe and read import data through the pipe, here is an example; http://www.orafaq.com/faqunix.htm#EXPORT
Within an external table creation ddl it is possible to use a unix pipe instead of a file and you can start an uncompress process to the pipe before reading the external table with dbms_scheduler for example. But external tables only work with text files as inputs directly and this is documented; http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/tables.htm#sthref2448
b- The ETL tool we used created the extracted text files under different folders and each partition’s data with the same file name. So in order to read this information over an external table even they are all uncompressed we needed to create some symbolic links. But as I linked the documentation above even the files were functional over symbolic links at operating system, Oracle created the external tables without any error but when we tried to access the data KUP-04001: error opening file error was inevitable.
Anyway, god bless sed and awk for getting this linking script output so easily!
Code Listing 208a - AWK and SED magic to create ln -s script from an Ab initio map file
Also here are resources for awk and sed on OTN;
AWK: The Linux Administrators’ Wisdom Kit By Emmett Dulaney
Using the sed Editor By Emmett Dulaney
c- The access path’s for joining two external tables or an external table with a heap organized table were HASH JOIN BUFFERED for equality conditions and SORT-MERGE JOIN for non-equality conditions, nothing unexpected here. I used SQL*Plus’s timing and autotrace features also Kyte’s runstats package during my comparisons. With autotrace statistics recursive calls and consistent gets were major differences where as with runstats outputs pga memory max, session pga memory, DB time statistics and process queue reference latch was higher with external tables and table scan rows gotten, physical read bytes statistics were very higher with heap organized tables. Still the elapsed time of heap organized tables were ~5 to 10 times less for the same operations compared to the external tables.
Code Listing b - Simple performance tests with External Tables
As a summary I may easily advice that external tables are not best to access several times but they are here for our Create Table as Select(CTAS) statements and compared to SQL*Loader the Select part of the CTAS is much more flexible for our transformation needs. Also IMHO tools perform well only when they are used for they were advised and designed to, so feeling unsatisfactory when experiencing an external table not reading through compressed files is really funny to me.
Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0
References Used :
Managing External Tables
PL/SQL based Oracle Database Loader(PSODL) Release 2.0 on sourceforge.net
Hüsnü announced the ancestor of this package a while ago in his Loading Oracle series. I developed this second release based on his idea and opened a project on sourceforge.net for the community’s contribution. In his announcement above Hüsnü argues on simple looping over concurrent loading with the insert performance of an index organized versus a heap organized table example. The first example I will be giving for concurrent load will be a famous OLTP example, using bind variables, and bitmap indexes locking behavior will be the second to come with another post.
But before getting into details with examples some marketing of course :) What is the need(or want) for PSODL;
- PSODL is completely PL/SQL based, so it is easy to setup, customize and use it for an Oracle database developer or administrator,
- PSODL is parametric, any pl/sql block can be configured for your loading requirements,
- PSODL by default produces each thread’s elapsed timing, top 3 latching and session statistics based on V$ views(based on Kyte’s runstats package),
- PSODL can be configured to produce event 10046 level 8 sql trace file for detailed profiling analysis if wanted,
- With PSODL’s outputs you can easily know the average and standard deviation statistics of your applications for the determinism of their response times under load,
- PSODL can be used on any Oracle editions and with releases higher than 10gR1(since DBMS_SCHEDULER is used).
Some warnings and constraints;
- PSODL errors can be followed from the database’s alert log file and dba_scheduler_job_run_details.status column,
- PSODL loads your scenarios at the very same time, but normally even on a heavy loaded OLTP system some 10s of the same requests won’t be starting at the very same time, so while setting up your scenarios better to remember this fact,
- PSODL can also be used on 9i or 8i releases if you replace DBMS_SCHEDULER with DBMS_JOB supplied package,
- PSODL produces trace files under your background dump destination since the threads you will be tracing here are job processes.
And some possible next release features;
- A brief Installation and User Guide will hopefully be documented,
- LATCH and STATS reporting will be enhanced,
- Statspack and after 10g ASH-AWR-ADDM performance tuning options will be investigated for possible integration,
- An Apex application will be developed for getting the parameters and reporting the outputs(charts) of the load scenario.
Below demonstrations were done under sys schema of a system as of;
SunOS 5.10 Generic_118833-17 sun4u sparc SUNW,Sun-Fire-880
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
Code Listing 204a - bind vs. nobind demo for simple looping
runstats package source and setup
Code Listing 204b - bind vs. nobind demo for concurrent loading
pkg_cncrnt_loader package source and setup
Fighting against any kind of myth or guess requires testing, but your testing quality and strategy must be appropriate for your needs, if not you may be introducing new myths most possibly. PSODL will always have some theoretical boundaries but I believe that in time customized PSODLs will assist you more compared to simple looping for your projects’ success.
ps: since PSODL project on sourceforge.net is still waiting on pending status you may use this link temporarily for second release setup information.
Performance Tuning Protocol - Part 2
In Part 1 I tried to summaries why we need these tools and when to use which one. Also additional to the post at the comments I mentioned dbms_sqltune(10g), dbms_monitor(10g) packages and _trace_files_public parameter. With this additional post I will mention some more tools of course :)
Wait Interface related tools
As I mentioned in Part 1 10g introduces a number of enhancements designed to automate and simplify the performance diagnostic process. Grid control and dynamic performance views like v$session, v$session_wait, v$active_session_history etc. will simply guide you to the root causes of your performance problems. Perform Without Waiting article on Oracle Magazine July/August 2004 by Arup Nanda is a very good starting point as usual. Also Kyle Hailey’s Oracle Wait Interface Introduction presentation is another good reference.
Open source tools
There are a lot of these kind of scripts, tools around. One thing to remember is always to test them on your test databases first.
Kyle Hailey’s direct SGA access project and ASH monitor
Tanel Poder’s Oracle Session Snapper and Session Wait script
Metalink tools and references
If you have access to Oracle support site I suggest to download and try these tools on your test system, especially I love SQLTXPLAIN.SQL since it provides a summary report like pilot’s console during a flight, anything you may need related to a query is in front of you :)
Note:215187.1 - SQLTXPLAIN.SQL - Enhanced Explain Plan and related diagnostic info for one SQL statement
Note:243755.1 - Implementing and Using the PL/SQL Profiler
Note:224270.1 - Trace Analyzer TRCANLZR - Interpreting Raw SQL Traces with Binds and/or Waits generated by EVENT 10046
Note:238684.1 - SQLAREAT.SQL - SQL Area, Plan and Statistics for Top DML (expensive SQL in terms of logical or physical reads)
Note 39817.1 - Interpreting Raw SQL_TRACE and DBMS_SUPPORT.START_TRACE output
Note:62294.1 - The DBMS_SUPPORT Package
And if you still need more tools and have a budget Quest’s http://www.quest.com/spotlight-on-oracle and http://www.quest.com/SQL-Optimizer-for-Oracle are my favorites :)
“It is all about understanding.”
These words belong to Tom Kyte, written on the preface of Jonathan Lewis’s latest book Cost Based Fundamentals of Oracle. If you are paying your bills as an Oracle database developer or dba without doubt it is of course all about understanding the optimizer, why it does what it does as Kyte mentions.
Last year we were very excited about this book and immediately organized a weekend summit :) to study as a group. The members were our internship program students and the weekend was a real success. We prepared an hour presentation for each chapter, group discussion was very effective and educative.
So this year we planned to repeat this event with the support of the new internees of last summer. Another good news is my company, Turkcell Teknoloji, will be our sponsor this year, for the place and the foods :)
Opening Preface Never say “never again” - H.Tonguç Yılmaz
Chapter 1 What do you mean by cost? - Merve Olamlı
Chapter 2 Tablescans - Merve Olamlı
Chapter 3 Single table selectivity - Ersin Ünkar
Chapter 4 Simple B-tree access - Özay Akdora
Chapter 5 The Clustering Factor - Engin Zorlu
Chapter 6 Selectivity Issues - Mert İnan
Chapter 7 Histograms - Özgür Macit
Chapter 8 Bitmap Indexes - Aykut Öztürk
Chapter 9 Query Transformation - Baran İpek
Chapter 10 Join Cardinality - Engin Kurtoğlu
Chapter 11 Nested Loops - Övünç Bozcan
Chapter 12 Hash Joins - Ertük Diriksoy
Chapter 13 Sorting and Merge Joins - Ertürk Diriksoy
Chapter 14 The 10053 trace file - Hüsnü Şensoy
Appendix A Upgrade Headaches - Hüsnü Şensoy
Appendix B Optimizer Parameters - Hüsnü Şensoy
Additional topic : Oracle 11g OCI New Features - Ogün Heper
Here are two additional resources to the book I may advice;
Jonathan Lewis’s comments on each chapter
Wolfgang Breitling’s presentations and articles
Thanks to Jonathan Lewis for his efforts and making this book possible, it is a great value to the Oracle community and we are waiting for the triology to continue :)
Also as we have Jonathan’s permission to share the presentations we prepare for the weekend studies, I will be updating this post with the material available ~10 days later, see you than.
Oracle OLAP features performance testing Part 2
In Part 1 I shared our testing environment setup, read performance tests and their results and some important resources for data warehousing with Oracle. Before continuing with the write cost of the options in this post, let me mention two things; first I am sorry that I can not share all details of the table structures and queries etc. in these posts, so please accept these posts as non-technical. Second, I think in both worlds, OLAP or OLTP doesn’t matter, first thing to do is to govern your end users needs and wants, if you let them govern you then the result will be most probably a very complex(flexible they will call:) model which will inevitably end up with of course lots of challenging performance issues. When it comes to OLAP this complexity at the transformation step of your ETL may cost you a lot, especially if we are talking about some XX TBs of daily change.
During this post I will be using “BI” for Bitmap Index, “MV” for Materialized View and “BJI” for Bitmap Join Index.
a. Cost of Bitmap Indexes
We planned seven BIs on our main fact table for star transformation tests with different kinds of queries having different filtering levels. During below tests there were no MV logs or BJIs in the test schema;
Listing 192a - Cost of Bitmap Indexes
-- a simple local bitmap index creation template CREATE BITMAP INDEX BI.. NOLOGGING PARALLEL COMPUTE STATISTICS LOCAL ; -- %2 and %10 sampled data inserted to simulate ETL cost create table FACT_SMP2 nologging parallel pctused 99 pctfree 1 as select /*+ PARALLEL */ * from FACT SAMPLE(2); insert /*+ append parallel(a) */ into FACT a select /*+ PARALLEL(b) */ * from FACT_SMP2 b ; commit ;
b. Cost of Materialized Views
For our reporting needs we created three different kinds of summaries. And during refresh tests;
- all BIs were dropped,
- all MVs were created on a single partition of the main fact because of the test system’s limited resources,
- because of our ETL character we tested %2(2,642,727 rows) and %10(13,210,698 rows) sampling inserts before MVs incremental refreshes
Listing 192b - Cost of Materialized Views
alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity=enforced;
-- a simple template of MV log for incremental refresh of MVs
DROP MATERIALIZED VIEW LOG ON FACT ;
CREATE MATERIALIZED VIEW LOG ON FACT
WITH ROWID, SEQUENCE(all_fact_columns_joined_with_dimensions, ..)
INCLUDING NEW VALUES;
-- a simple template to create a MV
DROP MATERIALIZED VIEW MV ;
CREATE MATERIALIZED VIEW MV
PCTFREE 1 PCTUSED 99 INITRANS 7 MAXTRANS 255
STORAGE ( INITIAL 50M NEXT 50M MINEXTENTS 5 MAXEXTENTS 512 PCTINCREASE 0 FREELISTS 7)
NOLOGGING PARALLEL
BUILD DEFERRED
REFRESH FORCE ON DEMAND
WITH ROWID
ENABLE QUERY REWRITE AS
SELECT ...
-- a simple template to FULL refresh of MV and gather statistics
exec dbms_mview.refresh('MV', method => 'C', parallelism => 4, atomic_refresh=>false);
exec dbms_stats.gather_table_stats( USER, tabname=>'MV', degree=>4, estimate_percent=>dbms_stats.auto_sample_size,
cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL');
commit ;
-- a simple template to INCREMENTAL refresh of MV and gather statistics
exec dbms_mview.refresh('MV', method => 'F', parallelism => 4);
exec dbms_stats.gather_table_stats( USER, tabname=>'MV', degree=>4, estimate_percent=>dbms_stats.auto_sample_size,
cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL');
commit ;
MVs bring important performance costs in order to be kept fresh and compared to star transformation with BIs MVs read performance advantage, this cost may be something to be avoided.
c. Cost of Bitmap Join Indexes
BJIs advantages and disadvantages are summarized like below in the documentation;
Advantages -
Good performance for join queries, space efficient
Especially useful for large dimension tables in star schemas
Disadvantages -
Maintenance costs are higher, building or refreshing a bitmap join index requires a join
More indexes are required, up to one index per dimension-table column rather than one index per dimension table is required
During our tests for reporting needs appropriate BJIs were nearly impossible to create. Additionally to create a BJI you need to have a primary or unique key constraint on the referenced table.
-- BJI requires a PRIMARY or UNIQUE key constraint CREATE UNIQUE INDEX UI.. ON REFERENCED_DIMENSION(..) NOLOGGING PARALLEL COMPUTE STATISTICS ; -- will use above index by default ALTER TABLE REFERENCED_DIMENSION ADD CONSTRAINT PK_REFERENCED_DIMENSION PRIMARY KEY (..) NOLOGGING PARALLEL ; -- a simple BJI creation template CREATE BITMAP INDEX BJI.. ON FACT(columns_will_be_used_on_referenced_dimension, ..) FROM FACT, REFERENCED_DIMENSION WHERE .. NOLOGGING PARALLEL COMPUTE STATISTICS LOCAL ; BJI 1 creation time and size Elapsed: 04:56:25.04 15,625 GB BJI 2(an alternative for BJI 1) creation time and size Elapsed: 04:59:28.05 15,625 GB
For this series the question was when to choose which Oracle OLAP option;
- Star Transformation with Bitmap Indexes or
- Summary Management with Materialized Views or
- Bitmap Join Indexes or
- of course full and parallel partitioned table scans with hash joins
I may easily say after ending the first round of the performance tests on the test system it will all depend on your own reporting needs(read) and ETL character(write). So you need to test both read and write ends and decide for each condition’s best fit. Usually you will end up with a mixture of these solutions I guess. And remember that some of the performance gains may be so attractive that these results might guide you to modify your model, so it is always better to have this kind of performance test results during the development phase.
Last words goes to 10G Release 2(10.2.0.3) ; you have my word on that all mentioned options are very stable, so they really worth your testing efforts on this release and truncate all your past experiences :)