In Part 1 I shared our testing environment setup, read performance tests and their results and some important resources for data warehousing with Oracle. Before continuing with the write cost of the options in this post, let me mention two things; first I am sorry that I can not share all details of the table structures and queries etc. in these posts, so please accept these posts as non-technical. Second, I think in both worlds, OLAP or OLTP doesn’t matter, first thing to do is to govern your end users needs and wants, if you let them govern you then the result will be most probably a very complex(flexible they will call:) model which will inevitably end up with of course lots of challenging performance issues. When it comes to OLAP this complexity at the transformation step of your ETL may cost you a lot, especially if we are talking about some XX TBs of daily change.
During this post I will be using “BI” for Bitmap Index, “MV” for Materialized View and “BJI” for Bitmap Join Index.
a. Cost of Bitmap Indexes
We planned seven BIs on our main fact table for star transformation tests with different kinds of queries having different filtering levels. During below tests there were no MV logs or BJIs in the test schema;
-- a simple local bitmap index creation template CREATE BITMAP INDEX BI.. NOLOGGING PARALLEL COMPUTE STATISTICS LOCAL ; -- %2 and %10 sampled data inserted to simulate ETL cost create table FACT_SMP2 nologging parallel pctused 99 pctfree 1 as select /*+ PARALLEL */ * from FACT SAMPLE(2); insert /*+ append parallel(a) */ into FACT a select /*+ PARALLEL(b) */ * from FACT_SMP2 b ; commit ;
b. Cost of Materialized Views
For our reporting needs we created three different kinds of summaries. And during refresh tests;
- all BIs were dropped,
- all MVs were created on a single partition of the main fact because of the test system’s limited resources,
- because of our ETL character we tested %2(2,642,727 rows) and %10(13,210,698 rows) sampling inserts before MVs incremental refreshes
alter session set query_rewrite_enabled=true; alter session set query_rewrite_integrity=enforced; -- a simple template of MV log for incremental refresh of MVs DROP MATERIALIZED VIEW LOG ON FACT ; CREATE MATERIALIZED VIEW LOG ON FACT WITH ROWID, SEQUENCE(all_fact_columns_joined_with_dimensions, ..) INCLUDING NEW VALUES; -- a simple template to create a MV DROP MATERIALIZED VIEW MV ; CREATE MATERIALIZED VIEW MV PCTFREE 1 PCTUSED 99 INITRANS 7 MAXTRANS 255 STORAGE ( INITIAL 50M NEXT 50M MINEXTENTS 5 MAXEXTENTS 512 PCTINCREASE 0 FREELISTS 7) NOLOGGING PARALLEL BUILD DEFERRED REFRESH FORCE ON DEMAND WITH ROWID ENABLE QUERY REWRITE AS SELECT ... -- a simple template to FULL refresh of MV and gather statistics exec dbms_mview.refresh('MV', method => 'C', parallelism => 4, atomic_refresh=>false); exec dbms_stats.gather_table_stats( USER, tabname=>'MV', degree=>4, estimate_percent=>dbms_stats.auto_sample_size, cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL'); commit ; -- a simple template to INCREMENTAL refresh of MV and gather statistics exec dbms_mview.refresh('MV', method => 'F', parallelism => 4); exec dbms_stats.gather_table_stats( USER, tabname=>'MV', degree=>4, estimate_percent=>dbms_stats.auto_sample_size, cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL'); commit ;
MVs bring important performance costs in order to be kept fresh and compared to star transformation with BIs MVs read performance advantage, this cost may be something to be avoided.
c. Cost of Bitmap Join Indexes
BJIs advantages and disadvantages are summarized like below in the documentation;
Good performance for join queries, space efficient
Especially useful for large dimension tables in star schemas
Maintenance costs are higher, building or refreshing a bitmap join index requires a join
More indexes are required, up to one index per dimension-table column rather than one index per dimension table is required
During our tests for reporting needs appropriate BJIs were nearly impossible to create. Additionally to create a BJI you need to have a primary or unique key constraint on the referenced table.
-- BJI requires a PRIMARY or UNIQUE key constraint CREATE UNIQUE INDEX UI.. ON REFERENCED_DIMENSION(..) NOLOGGING PARALLEL COMPUTE STATISTICS ; -- will use above index by default ALTER TABLE REFERENCED_DIMENSION ADD CONSTRAINT PK_REFERENCED_DIMENSION PRIMARY KEY (..) NOLOGGING PARALLEL ; -- a simple BJI creation template CREATE BITMAP INDEX BJI.. ON FACT(columns_will_be_used_on_referenced_dimension, ..) FROM FACT, REFERENCED_DIMENSION WHERE .. NOLOGGING PARALLEL COMPUTE STATISTICS LOCAL ; BJI 1 creation time and size Elapsed: 04:56:25.04 15,625 GB BJI 2(an alternative for BJI 1) creation time and size Elapsed: 04:59:28.05 15,625 GB
For this series the question was when to choose which Oracle OLAP option;
- Star Transformation with Bitmap Indexes or
- Summary Management with Materialized Views or
- Bitmap Join Indexes or
- of course full and parallel partitioned table scans with hash joins
I may easily say after ending the first round of the performance tests on the test system it will all depend on your own reporting needs(read) and ETL character(write). So you need to test both read and write ends and decide for each condition’s best fit. Usually you will end up with a mixture of these solutions I guess. And remember that some of the performance gains may be so attractive that these results might guide you to modify your model, so it is always better to have this kind of performance test results during the development phase.
Last words goes to 10G Release 2(10.2.0.3) ; you have my word on that all mentioned options are very stable, so they really worth your testing efforts on this release and truncate all your past experiences :)