H.Tonguç Yılmaz – Oracle Blog

July 3, 2009

Interval Partitioning New Feature and Logging Option Problem

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 4:51 am

One of my old friends who is an Oracle DBA for at least 15 years always argued with me that “every new feature of Oracle is an evil!” :) He advised that it is not secure to upgrade a production environment to 10gR2 for example unless 11gR2 is released on your platform. Personally I always loved new features and tried to test them as soon as possible. I accept and also experience that there are always bugs around new features especially when they are complemented with some other feature but this is why we have Oracle support.

Recently on our 11.1 ODS production database we saw that the automatic created partitions do not inherit the LOGGING property of neither the table nor the tablespace it belongs to where as it is created always with LOGGING on. If you are trying to benefit from direct path inserts and your database is running on ARCHIVELOG mode and the target segments are LOGGING you will be failing to benefit and producing lots of redo and undo information depending on the size of data you are inserting. So an SR opened, bug escalated, a typical process.

Until 11.1 we had our own interval partitioning maintenance scripts and still we stick to them especially because the automatically created partitions’ naming. This is something I think will be evolving with the future releases since anybody who uses partitioning feature love to name their partitions and is not happy with anonymous partition names. But there is no evil with this and it is the life cycle of any software I think; a new feature comes out – some people test it/consume it and find bugs or report additional feature requests – the producer handles these within future releases or patches.

ps: for exchange of anonymous named partitions you may use EXCHANGE PARTITION FOR clause as a workaround but still with several limitations, you may check this post for more details.

April 3, 2009

Getting Started and Extending your Knowledge with Oracle Warehouse Builder 11g

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 1:08 pm

I already mentioned here and here that I love Apex and OWB since they are Oracle database integrated development environments, this is simply because I am an old school SQL and PL/SQL minded person and these two development environments make you feel more powerful, which is IMHO a really cool feeling during chatting with the Java/SOA guys :)

I have been using OWB 11g for some time now and I want to prepare a getting starting material to assist my colleagues. Apex has a two-days developer guide for this purpose, and OWB has its fantastic OTN OBEs.

If you are already aware of the BI fundamentals and Oracle database fundamentals you can go through these 16 11g OWB OBEs within a day or two and than I am pretty sure you will get the feeling I mentioned. As you may know with 11gR1 now OWB and Apex are a part of the standard database installation, so if you have a 11g installation you already have an OWB practice environment, following the first OBE will be enough to initialize your OWB inside your 11g installation.

These OBEs do not only include typical practices but also information on Warehouse Builder 11g architecture, components, BI concepts like Star Schema and Snowflake Schema or Slowly Changing Dimensions. Also here is an additional viewlet which may give you an idea about OWB. Remember what actions you do inside your mappings they will all deploy a standart PL/SQL package, these old friends are your ETL executables for your each BI need. And OWB is a cost-free(database-licence included) option and it is a highly optimized ETL tool which uses Oracle’s SQL capabilities.

And as some readers may be thinking since Oracle bought ODI OWB will be dead soon, no this not true and you may read the future strategy from OWB’s corporate blog here, to my understanding OWB and ODI will be a part of a combined BI solution strategy together, like a coffee and milk together, so this is not a coffee or a tea choise :)

December 30, 2008

Oracle’s Warehouse Builder for any kind of your Extraction-Transformation-Loading needs within Oracle database

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 1:06 pm

Warehouse Builder is a cost-free(part of the database license) ETL tool which produces PL/SQL when its mappings are deployed. The components you use to develop your data flows are based on Oracle’s SQL and PL/SQL capabilities.

In my opinion you really don’t need to own a warehouse to use OWB, with 11g now it is even coming installed with your database. So if you are also an Oracle database minded developer you better plan to check OWB 11g, similar to Apex I can guarantee that you will be having lots of fun and most probably some opportunity to develop your knowledge :)

Here is a hint showing how it can be possible to automate generation from your standard SQLs to OWB mappings.

And here you will find some important OWB hints.

December 29, 2008

Using Pipelined Table Functions for Siebel’s Audit_Read Data Transformation

It has been a while, I thought it is time that I should write on something? :)

25-12-2008 was my second anniversary on this blog: 233 posts on 18 categories resulted 383 approved comments and ~330,000 page views and lots of good friendships until now. So many thanks to all who contributed!

I was very surprised that I didn’t write on one of my best features of Oracle until today, the Pipelined Table Functions. Beginning with the 9i days I benefited from Pipelined Functions for lots of complex transformations needs.

Here with this post I will share a recent example need to demonstrate this feature. Siebel as a source system for our warehouse has a logging table called S_AUDIT_READ, this table has a CLOB column AUDIT_LOG which holds a string to be parsed with the rules mentioned in the below link.

How to parse audit log value (Doc ID 555553.1)

It can be possible to parse the string in several methods as usual, with even a single SQL of course. But PL/SQL is more readable/maintainable compared to SQL, so Pipelined Table Fuctions is always a good alternative to transform a massive data, in terms of performance also. In the below demonstration I also got rid of the algorithm mentioned above because of the cases in our data.

Siebel’s Audit_Read Data Transformation with Oracle’s Pipelined Table Functions

With the help of Pipelined Table Functions while reading the source data in parallel, it is possible to transform and insert to target table through pipes. For more examples please visit:

http://www.oracle-developer.net/display.php?id=207

http://psoug.org/reference/pipelined.html

http://www.oracle.com/technology/products/oracle9i/daily/may30.html

Just before closing, I wish a very happy new year to all of us! :)

October 12, 2008

OWB resources versus APEX resources

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 8:02 pm

Just like Apex OWB also has a very beneficial OTN forum and a Utility Exchange point to share tips and tricks. Both have Installation, Users Guides and OBEs.

But OWB has a corporate blog where as Apex(as far as I know) does not have a corporate one yet and as of today OWB does not have a 2 Day+ Developer’s Guide like Apex.

October 4, 2008

Understand Your Data with Oracle’s In-Database Statistical Functions

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 8:20 pm

Especially after 10g Oracle’s in-database functionalities like Analytics, Grouping and XML Functions, Regular Expressions and Model Clause are much more popular. Since they are in-database, all are scalable, performant and parallelizable, as a result knowing them makes a lot of difference. Here is a documentation linked list of 10.2 functions, unlike other statistical software vendors all these functionalities are FREE inside the Enterprise Edition license.

There are two more categories which requires mentioning: Oracle’s In-Database Data Mining and Statistical Functions. These statistical functions include descriptive statistics, hypothesis testing, correlations analysis, test for distribution fit, cross tabs with Chi-square statistics, analysis of variance(ANOVA), and Pareto analysis. Also DBMS_STAT_FUNCS.SUMMARY procedure summarizes all the basic descriptive statistics for a table’s attribute;


-- stats_mode: for most common value
select median(salary), stats_mode(salary) from employees ;

MEDIAN(SALARY) STATS_MODE(SALARY)
-------------- ------------------
          6200               2500

select stats_t_test_one(salary, 1000) two_sided_p_value from employees ;

TWO_SIDED_P_VALUE
-----------------
       8.2692E-27

select corr_s(employee_id, salary) coefficient from employees ;

COEFFICIENT
-----------
 -.04482358

set serveroutput on
declare
	s1 DBMS_STAT_FUNCS.SummaryType;
begin
    -- must be a numeric column
	DBMS_STAT_FUNCS.SUMMARY(p_ownername=> 'HR', p_tablename=> 'EMPLOYEES', p_columnname=>'SALARY', p_sigmavalue=>3, s=> s1);
	dbms_output.put_line('SUMMARY STATISTICS');
	dbms_output.put_line('Count: '||s1.count);
	dbms_output.put_line('Min: '||s1.min);
	dbms_output.put_line('Max: '||s1.max);
	dbms_output.put_line('Range: '||s1.range);
	dbms_output.put_line('Mean: '||round(s1.mean));
	dbms_output.put_line('Mode Count: '||s1.cmode.count);
	dbms_output.put_line('Mode: '||s1.cmode(1));
	dbms_output.put_line('Variance: '||round(s1.variance));
	dbms_output.put_line('Stddev: '||round(s1.stddev));
	dbms_output.put_line('Quantile 5 '||s1.quantile_5);
	dbms_output.put_line('Quantile 25 '||s1.quantile_25);
	dbms_output.put_line('Median '||s1.median);
	dbms_output.put_line('Quantile 75 '||s1.quantile_75);
	dbms_output.put_line('Quantile 95 '||s1.quantile_95);
	dbms_output.put_line('Extreme Count: '||s1.extreme_values.count);
	dbms_output.put_line('Extremes: '||s1.extreme_values(1));
	dbms_output.put_line('Top 3: '||s1.top_5_values(1)||','||s1.top_5_values(2)||','||s1.top_5_values(3));
	dbms_output.put_line('Bottom 3: '||s1.bottom_5_values(5)||','||s1.bottom_5_values(4)||','||s1.bottom_5_values(3));
end;
/
SUMMARY STATISTICS
Count: 107
Min: 2100
Max: 24000
Range: 21900
Mean: 6462
Mode Count: 1
Mode: 2500
Variance: 15283141
Stddev: 3909
Quantile 5 2500
Quantile 25 3100
Median 6200
Quantile 75 8900
Quantile 95 12700
Extreme Count: 1
Extremes: 24000
Top 3: 24000,17000,17000
Bottom 3: 2100,2200,2200

PL/SQL procedure successfully completed.

After 11g mining the unstructured data is possible with these functions. All these in-database functionality other than SQL have additional Java APIs, so cost is not the only advantage as SQL and Java are the most popular standard IT platforms of today. With all these in-database functionality Oracle database now is a sophisticated, scalable and secure analytical engine more than ever. :)

http://otn.oracle.com/bi

ps: 2nd BIWA summit presentations are available at http://www.oraclebiwa.org and currently it is free to join Oracle BIWA SIG. :)

March 11, 2008

A little more on External Tables

Filed under: Oracle Business Intelligence, Oracle How To, Oracle Performance — H.Tonguç Yılmaz @ 5:27 am

I have written on external tables before;
External table’s performance for a row and undocumented Oracle functions
Unload Data with External Tables and Data Pump

This time last week several questions from our ETL group guided me to test a little more external tables on Solaris 10 and 10g Release 2. Some of the interesting questions were;

a- can we read compressed text data over external tables,

b- can we read data over symbolic linked(ln -s) files with external tables,

c- what is the performance difference of accessing an external table compared to a heap organized and partitioned table, what are the access paths and join methods.

So let me share with you some of my findings parallel to the above questions;

a- Since the extracted files were so huge and needed so much extra disk space to uncompress this was needed. Unix pipes are like magic, for example you can export to a unix pipe and read import data through the pipe, here is an example; http://www.orafaq.com/faqunix.htm#EXPORT

Within an external table creation ddl it is possible to use a unix pipe instead of a file and you can start an uncompress process to the pipe before reading the external table with dbms_scheduler for example. But external tables only work with text files as inputs directly and this is documented; http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/tables.htm#sthref2448

b- The ETL tool we used created the extracted text files under different folders and each partition’s data with the same file name. So in order to read this information over an external table even they are all uncompressed we needed to create some symbolic links. But as I linked the documentation above even the files were functional over symbolic links at operating system, Oracle created the external tables without any error but when we tried to access the data KUP-04001: error opening file error was inevitable.

Anyway, god bless sed and awk for getting this linking script output so easily!

Code Listing 208a – AWK and SED magic to create ln -s script from an Ab initio map file

Also here are resources for awk and sed on OTN;
AWK: The Linux Administrators’ Wisdom Kit By Emmett Dulaney
Using the sed Editor By Emmett Dulaney

c- The access path’s for joining two external tables or an external table with a heap organized table were HASH JOIN BUFFERED for equality conditions and SORT-MERGE JOIN for non-equality conditions, nothing unexpected here. I used SQL*Plus’s timing and autotrace features also Kyte’s runstats package during my comparisons. With autotrace statistics recursive calls and consistent gets were major differences where as with runstats outputs pga memory max, session pga memory, DB time statistics and process queue reference latch was higher with external tables and table scan rows gotten, physical read bytes statistics were very higher with heap organized tables. Still the elapsed time of heap organized tables were ~5 to 10 times less for the same operations compared to the external tables.

Code Listing 208b – Simple performance tests with External Tables

As a summary I may easily advice that external tables are not best to access several times but they are here for our Create Table as Select(CTAS) statements and compared to SQL*Loader the Select part of the CTAS is much more flexible for our transformation needs. Also IMHO tools perform well only when they are used for they were advised and designed to, so feeling unsatisfactory when experiencing an external table not reading through compressed files is really funny to me.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

References Used :
Managing External Tables

January 30, 2008

Oracle OLAP features performance testing Part 2

Filed under: Oracle Business Intelligence, Oracle Performance — H.Tonguç Yılmaz @ 2:06 pm

In Part 1 I shared our testing environment setup, read performance tests and their results and some important resources for data warehousing with Oracle. Before continuing with the write cost of the options in this post, let me mention two things; first I am sorry that I can not share all details of the table structures and queries etc. in these posts, so please accept these posts as non-technical. Second, I think in both worlds, OLAP or OLTP doesn’t matter, first thing to do is to govern your end users needs and wants, if you let them govern you then the result will be most probably a very complex(flexible they will call:) model which will inevitably end up with of course lots of challenging performance issues. When it comes to OLAP this complexity at the transformation step of your ETL may cost you a lot, especially if we are talking about some XX TBs of daily change.

During this post I will be using “BI” for Bitmap Index, “MV” for Materialized View and “BJI” for Bitmap Join Index.

a. Cost of Bitmap Indexes

We planned seven BIs on our main fact table for star transformation tests with different kinds of queries having different filtering levels. During below tests there were no MV logs or BJIs in the test schema;

Listing 192a – Cost of Bitmap Indexes

-- a simple local bitmap index creation template
CREATE BITMAP INDEX BI.. NOLOGGING PARALLEL COMPUTE STATISTICS LOCAL ;

-- %2 and %10 sampled data inserted to simulate ETL cost
create table FACT_SMP2 nologging parallel pctused 99 pctfree 1 as
select /*+ PARALLEL */ * from FACT SAMPLE(2);

insert /*+ append parallel(a) */ into FACT a
select /*+ PARALLEL(b) */ * from FACT_SMP2 b ;
commit ;

b. Cost of Materialized Views

For our reporting needs we created three different kinds of summaries. And during refresh tests;
- all BIs were dropped,
- all MVs were created on a single partition of the main fact because of the test system’s limited resources,
- because of our ETL character we tested %2(2,642,727 rows) and %10(13,210,698 rows) sampling inserts before MVs incremental refreshes

Listing 192b – Cost of Materialized Views

alter session set query_rewrite_enabled=true;
alter session set query_rewrite_integrity=enforced;

-- a simple template of MV log for incremental refresh of MVs
DROP MATERIALIZED VIEW LOG ON FACT ;
CREATE MATERIALIZED VIEW LOG ON FACT
WITH ROWID, SEQUENCE(all_fact_columns_joined_with_dimensions, ..)
INCLUDING NEW VALUES;

-- a simple template to create a MV
DROP MATERIALIZED VIEW MV ;
CREATE MATERIALIZED VIEW MV
PCTFREE 1 PCTUSED 99 INITRANS 7 MAXTRANS 255
STORAGE ( INITIAL 50M NEXT 50M MINEXTENTS 5 MAXEXTENTS 512 PCTINCREASE 0 FREELISTS 7)
NOLOGGING PARALLEL
BUILD DEFERRED
REFRESH FORCE ON DEMAND
WITH ROWID
ENABLE QUERY REWRITE AS
SELECT ...

-- a simple template to FULL refresh of MV and gather statistics
exec dbms_mview.refresh('MV', method => 'C', parallelism => 4, atomic_refresh=>false);
exec dbms_stats.gather_table_stats( USER, tabname=>'MV', degree=>4, estimate_percent=>dbms_stats.auto_sample_size,
cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL');
commit ;

-- a simple template to INCREMENTAL refresh of MV and gather statistics
exec dbms_mview.refresh('MV', method => 'F', parallelism => 4);
exec dbms_stats.gather_table_stats( USER, tabname=>'MV', degree=>4, estimate_percent=>dbms_stats.auto_sample_size,
cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL');
commit ;

MVs bring important performance costs in order to be kept fresh and compared to star transformation with BIs MVs read performance advantage, this cost may be something to be avoided.

c. Cost of Bitmap Join Indexes

BJIs advantages and disadvantages are summarized like below in the documentation;

Advantages –
Good performance for join queries, space efficient
Especially useful for large dimension tables in star schemas

Disadvantages –
Maintenance costs are higher, building or refreshing a bitmap join index requires a join
More indexes are required, up to one index per dimension-table column rather than one index per dimension table is required

During our tests for reporting needs appropriate BJIs were nearly impossible to create. Additionally to create a BJI you need to have a primary or unique key constraint on the referenced table.

-- BJI requires a PRIMARY or UNIQUE key constraint
CREATE UNIQUE INDEX UI.. ON REFERENCED_DIMENSION(..) NOLOGGING PARALLEL COMPUTE STATISTICS ;   

-- will use above index by default
ALTER TABLE REFERENCED_DIMENSION
ADD CONSTRAINT PK_REFERENCED_DIMENSION
PRIMARY KEY (..) NOLOGGING PARALLEL ;

-- a simple BJI creation template
CREATE BITMAP INDEX BJI..
ON FACT(columns_will_be_used_on_referenced_dimension, ..)
FROM FACT, REFERENCED_DIMENSION
WHERE ..
NOLOGGING PARALLEL COMPUTE STATISTICS LOCAL ;

BJI 1 creation time and size
	Elapsed: 04:56:25.04
	15,625 GB

BJI 2(an alternative for BJI 1) creation time and size
	Elapsed: 04:59:28.05
	15,625 GB

For this series the question was when to choose which Oracle OLAP option;
- Star Transformation with Bitmap Indexes or
- Summary Management with Materialized Views or
- Bitmap Join Indexes or
- of course full and parallel partitioned table scans with hash joins

I may easily say after ending the first round of the performance tests on the test system it will all depend on your own reporting needs(read) and ETL character(write). So you need to test both read and write ends and decide for each condition’s best fit. Usually you will end up with a mixture of these solutions I guess. And remember that some of the performance gains may be so attractive that these results might guide you to modify your model, so it is always better to have this kind of performance test results during the development phase.

Last words goes to 10G Release 2(10.2.0.3) ; you have my word on that all mentioned options are very stable, so they really worth your testing efforts on this release and truncate all your past experiences :)

January 21, 2008

Oracle OLAP features performance testing Part 1

Filed under: Oracle Business Intelligence, Oracle Performance — H.Tonguç Yılmaz @ 10:26 am

Since I grew up in a small sea village I always loved fishing. But when I was young it was always hard to wait for a fish, so I think I preferred diving and instead of waiting I went after fishes :) After years of OLTP performance testing OLAP performance testing was a similar experience, I had to train myself again for patience, here in Türkiye there is an old saying; “Only a patient dervish attains his wish.”

Before getting into details I have mentioned importance of testing several times, again this time the results proved that somethings even written in the documentation may not be the optimum for your case.

We are one of the largest Telecom operators in Europe and our data warehouse database is of course a big one. We are on Solaris 10 and using Hitachi storage devices with Oracle 9i Release 2. With company’s aggressive marketing needs our data warehouse needed to be re-structured from bottom to top. So this was a great opportunity for me to advice both continuing with 10g Release 2 and using Oracle’s OLAP features like star transformation and materialized views which the development team once upon an Oracle release tested and didn’t like the outcomes.

There was no upgrade need, for the new 10g Release 2 environment and we had the luxury of setting up a completely different database. This was a big advantage of course for both not upgrading a xx TB database, getting rid of the old fragmented dictionary and staring with a fresh one.

Hüsnü, one of the two DBAs, made lots of interesting load tests with Orion and I hope he also will be blogging some of the interesting results he got. Below are some of the important resources helped us during our tests;

- Oracle’s Data Warehousing Documentation
- Oracle Database 10g: The Top 20 Features for DBAs by Arup Nanda
- Oracle By Example Series – Business Intelligence and Data Warehousing
- Using Basic Database Functionality for Data Warehousing
- Using Oracle’s Materialized Views Capabilities

So the initial step was to setup a test environment for a specific end user reporting need, after getting the example queries for two big fact and several dimension tables, we immediately started to load several partitions of the facts and all data for the dimensions. Our old friend SQL*Loader with its direct path and nologging options was there for us, which I believe one of the best tools Oracle has ever developed. So here are the next steps of the tests briefly for your informations and comments;

a. Collecting Object and System Statistics after the load

With 10g CBO is the only path, RBO is unsupported anymore. So we gathered object statistics at schema, dictionary and fixed table levels;

-- schema level
exec dbms_stats.gather_schema_stats( ownname =>'ABC', degree=>16, options=>'GATHER AUTO', estimate_percent=>dbms_stats.auto_sample_size, cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL');
-- dictionary level
exec dbms_stats.gather_dictionary_stats( degree=>16, options=>'GATHER AUTO', cascade=>TRUE);
-- fixed table level
exec dbms_stats.gather_fixed_objects_stats ;

And also collected system statistics;

SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';
exec dbms_stats.gather_system_stats('start');
-- produce some load, for both parallel multi-block and single-block reads
exec dbms_stats.gather_system_stats('stop');
SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';

After 10g there is an automated scheduler job, since RBO is unsupported, to gather statistics. To be more in control we disabled this job;

SELECT owner, job_name,enabled
FROM DBA_SCHEDULER_JOBS
WHERE JOB_NAME = 'GATHER_STATS_JOB';

BEGIN
  DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB');
  COMMIT;
END;
/

b. Automatic Storage Management(ASM) and optimizer parameters

Not only it eliminates the performance and budget cost of a file system layer, but especially because of its management advantages ASM with 10g Release 2 was our another test topic. Hüsnü may blog about his Orion test results of ASM versus Veritas file system, here I will only mention that we used external redundancy and set two important ASM parameters as Note:368055.1 advised;

Subject: Deployment of very large databases (10TB to PB range) with Automatic Storage Management (ASM)
_asm_ausize=16777216
_asm_stripesize=1048576

ASM disks are divided into allocation units of 1MB by default, but for data warehouses setting _asm_ausize higher like 16MB is important. Also like statistics for the optimizer there are several important OLAP settings which has to be tuned on 10g Release 2;

SELECT name, value FROM v$system_parameter a
WHERE a.NAME IN ('compatible', 'optimizer_features_enable', 'optimizer_mode', 'parallel_execution_message_size',
        'pga_aggregate_target', 'workarea_size_policy', 'query_rewrite_enabled',
        'query_rewrite_integrity', 'parallel_max_servers', 'disk_asynch_io',
        'db_file_multiblock_read_count', 'star_transformation_enabled')

Here we left behind our 9iR2 parameters and decided to move on with fresh parameters after 10gR2, our strategy was to leave a parameter to its default unless some tests prove that it has to be changed. Hüsnü again made several detailed tests for parameter settings of like db_file_multiblock_read_count and parallel_execution_message_size and we choosed the optimum values from those results.

c. Bitmap indexes and star transformation

After all above initial steps here comes the segment level initial setups. As we know Bitmap İndexes are best for OLAP usage because of their bitmap level(not row level) locking behavior. Also for a better bitmap index the cardinality, the amount of different values of the column, is important.

For Oracle to do star transformation,
1- We created single-column bitmap indexes on fact table’s all dimension keys. The query scenarios filters were all on these bitmap indexed columns on fact table to dimensions.
2- We created unique key constraints on dimension tables’ joined columns.
3- We also created appropriate indexes on dimension filtered columns.

Here we were excited to see if the second large fact table’s id column would be used or not, because of its high cardinality it was guessed to be having higher costs compared to a full-parallel table scan and hash join alternative. Depending on the filtering factors, selecting at most 5% to 10% of the fact, star transformation always performed better than full-parallel table scan and hash join alternative. Also bitmap index on the second fact table’s reference column was used by the optimizer. When the filtering was not selective than full-parallel table scan and hash join took its multi-block read advantage and performed much more better.

Below sql scripts were run several times and we have taken the average values, we needed to write some shell script in order to keep going at nights :)

himalaya@oracle $ nohup sqlplus ../.. @...sql > ...log 2> ...err &

-- inside sql
-- full-parallel table scan and hash join test script
spool ...log
conn ../..
SET LINESIZE 2000
SET AUTOTRACE TRACEONLY
SET TIME ON
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DML ;
alter session set max_dump_file_size=unlimited ;
alter session set timed_statistics = true ;
-- alter session set STATISTICS_LEVEL = ALL ;
-- alter session set "_rowsource_execution_statistics" = true ;
alter session set tracefile_identifier = senaryo1_full ;

-- be sure that this test is not doing star trans. or query rewr.
ALTER SESSION SET star_transformation_enabled=FALSE;
SELECT /*+ norewrite use_hash(cs,ct,tr,ta,pa,sr,dc) parallel(cs) parallel(ct) parallel(tr) parallel(ta) parallel(pa) parallel(sr) pa
rallel(dc) */
/* SENARYO 1 FULL */ ..

-- star transformation test script
spool ...log
conn ../..
SET LINESIZE 2000
SET AUTOTRACE TRACEONLY
SET TIME ON
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DML ;
alter session set max_dump_file_size=unlimited ;
alter session set timed_statistics = true ;
-- alter session set STATISTICS_LEVEL = ALL ;
-- alter session set "_rowsource_execution_statistics" = true ;
alter session set tracefile_identifier = senaryo1_bi ;

-- be sure that this test is not doing query rewr. and star trans. is enabled
ALTER SESSION SET star_transformation_enabled=TRUE;
SELECT /*+ norewrite STAR_TRANSFORMATION */
/* SENARYO 1 BI */ ...

d. Materialized views and query rewrite

And as a second option for read performance we wanted to see was materialized views. Of course since these are the already calculated summaries, they were the best over both star transformation and full access alternatives.

-- materialized view test script
spool ...log
conn ../..
SET LINESIZE 2000
SET AUTOTRACE TRACEONLY
SET TIME ON
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DML ;
alter session set max_dump_file_size=unlimited ;
alter session set timed_statistics = true ;
-- alter session set STATISTICS_LEVEL = ALL ;
-- alter session set "_rowsource_execution_statistics" = true ;
alter session set tracefile_identifier = senaryo1_mv ;

-- force to use mat.view
ALTER SESSION SET star_transformation_enabled=FALSE;
SELECT /*+ rewrite_or_error */
/* SENARYO 1 MV */ ..

d. Microstrategy reporting tool

We use MSTR for the end user reporting needs, so we needed to be sure on MSTR’s behavior after 10gR2. Some questions were, if its driver was up to date and more important since MSTR also does some query rewrite and caching will the same queries from MSTR have the same execution plans like SQL*Plus tests we had, this was our primary concern.

Here we created a database logon trigger for MSTR database user to catch its produced queries to be sure what was going on;

CREATE OR REPLACE TRIGGER trace_trig
AFTER LOGON
ON DATABASE
DECLARE
   sqlstr VARCHAR2(200) := 'ALTER SESSION SET EVENTS ''10046 TRACE NAME CONTEXT FOREVER, LEVEL 8''';
BEGIN
   IF (USER = 'MSTR') THEN
      execute immediate sqlstr;
   END IF;
END trace_trig;
/

And we saw that MSTR was adding some dimension key columns to the original queries which affected materialized view query rewrite option. Also MSTR was having much more waits compared to the simple SQL*Plus client which will be another area to study deeper later.

During this first part I wanted to briefly explain our testing needs, setups and read performance tests. In the second part more important issues will be the topic, the write cost of each option because of daily ETL and data quality updates.

Especially incremental refresh results over MATERIALIZED VIEW LOGs and DBMS_MVIEW.REFRESH will be important since they performed the best results during reading tests in part 1. Here we will also be getting the most out of 10gR2, with DBMS_ADVISOR.TUNE_MVIEW, DBMS_ADVISOR.CREATE_FILE and DBMS_MVIEW.EXPLAIN_MVIEW supplied functionalities.

PS : I also want to advice these three blogs which I think are really good starting points for data warehousing with Oracle;

- The Oracle Sponge – Oracle Data Warehouse Design and Architecture by David Aldridge – http://oraclesponge.wordpress.com/
- Pete Scott’s random notes – The rantings of a middle-aged computer consultant and generally nice person – http://pjsrandom.wordpress.com/
- Rittman Mead Consulting – “Delivering Oracle Business Intelligence” – http://www.rittmanmead.com/blog/

January 2, 2008

New partition pruning features of 10g

Filed under: Oracle 10g New Features, Oracle Business Intelligence, Oracle Performance — H.Tonguç Yılmaz @ 3:49 pm

As I mentioned earlier I started working with our data warehouse team on their migration project; Planning and Testing the 9iR2 to 10gR2 migration of our XXX Tb DWH

I will be testing Oracle’s 10gR2 features for performance. So before tests in details, I also wanted to mention some important very large Oracle database init.ora 10g parameters additional to ASM and CBO configurations I already mentioned; Listing 177a – Some important very large Oracle database init.ora 10g parameters

During some initial tests I experienced new partition pruning features of 10g; “PARTITION RANGE OR”-”KEY(OR)” and “PARTITION RANGE MULTI-COLUMN”-”KEY(MC)” access paths; Code Listing 177b – New partition pruning features of 10g

These tests will be really cool :)

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

References Used :
Note:368055.1
Subject: Deployment of very large databases (10TB to PB range) with Automatic Storage Management (ASM)

Oracle Database 10g (10.2 & 10.1) New Features in Data Warehousing

Oracle10g-Features für Warehouses by Sven Vetter

Top Ten New Data Warehousing Features In Oracle Database 10g by Mark Rittman

Oracle Database 10g: Top Features for DBAs Release 2 Features Addendum by Arup Nanda
Part 4: Data Warehousing and Integration Features

Next Page »

Blog at WordPress.com.