Extract(Read) > Transform(Test&Apply) > Load(Learn) Blog

January 23, 2007

Oracle Business Intelligence and Extraction, Transformation, Loading(ETL) Part 2

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 9:58 am

In part 1 I mentioned the Parallel execution option of Oracle.

2- Introduction to Partitioning

Partitioning is defined as breaking down a large problem into smaller manageable pieces, the concept of Divide and Conquer. So with Oracle, partitioning refers to “Dividing Tables and Indexes into manageable pieces.”

Partitioning feature is used;
To increase availability; this is derived from the fact that partitions are independent entities,
To ease administration burdens; without partitions these operations will generally be a massive INSERT, UPDATE and DELETE(lots of redo&rollback I/O) statements, like archiving of data is on a schedule and repetitive,
To enhance DML and query performance; during Parallel DML Oracle uses many threads or process to perform your INSERT,UPDATE or DELETE instead of single process with partitioning.

Also storing partitions in separate tablespaces enables you to;

– Reduce the possibility of data corruption in multiple partitions
– Back up and recover each partition independently
– Control the mapping of partitions to disk drives (important for balancing I/O load)

After Oracle 9i Release 2 Range, Hash, List or Composite(Range-Hash or Range-List) can be chosen as a partitioning method. Following demostrations show the benefits of partitioning;

Code listing 64a : Convert A Partition Into A Stand-alone Table Example and Split Partition Example

Code listing 64b : Partition Elimination Example

Code listing 64c : OLTP and LOCAL versus GLOBAL index partitioning Example

As a conclusion, we have 5 table partitioning methods; Range, Hash, List, Range-Hash, Range-List and 3 index partitioning methods; Global Non-partition, Global partition, Locally partition with Oracle. And important considerations before partitioning a table or an index are;
– Is the table the “right” size and how volatile is its data?
– What are your maintenance considerations?
– How queries filter data on this data and how to choose the partition key?

To benefit when using partitioning option, like parallelism mentioned in part 1, it is vital that you have a goal you want to achieve with it. Also “Partitioning <> fast=true”.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.2.0

Refences Used :
http://www.psoug.org/reference/partitions.html
Oracle® Database Concepts 10g Release 2 (10.2) Chapter 18 Partitioned Tables and Indexes
Oracle® Database Administrator’s Guide 10g Release 2 (10.2) Chapter 17 Managing Partitioned Tables and Indexes
Drop parent partition post by Jonathan Lewis
Effective Oracle by Design by Thomas Kyte Chapter 3-Architectural Decisions, Page 185

Advertisements

January 9, 2007

Oracle Business Intelligence and Extraction, Transformation, Loading(ETL) Part 1

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 10:27 am

1- Introduction to Parallel Execution

Oracle 7.1 introduced parallel query option in 1994. Usually a full scan is a good candidate for parallel query if you are not on I/O bounded system.

Also as percieved you do NOT need more than one CPU in order to use this option, in other words a degree of parallelism greater then the number of cpus on the system.

Consider an index build you read, sort and write. When you are reading, you are not using the CPU so if there were another process sorting that would be great. Likewise when you are sorting, you are not using the disk if someone else could be reading or writing, that would be great.

Parallel execution dramatically reduces response time for data-intensive operations on large databases that are typically associated with decision support systems (DSS) and data warehouses. You can also implement parallel execution on certain types of online transaction processing (OLTP) and hybrid systems. Parallel execution improves processing for:
• Queries requiring large table scans, joins, or partitioned index scans
• Creation of large indexes
• Creation of large tables (including materialized views)
• Bulk inserts, updates, merges, and deletes
• Sorting large volumes of data
You can also use parallel execution to access object types within an Oracle database. For example, you can use parallel execution to access large objects(LOBs).

You can use parallel execution for any of the following:
• Access methods: Some examples are table scans, index full scans, and partitioned index range scans.
• Join methods: Some examples are nested loop, sort merge, hash, and star transformation.
• DDL statements: Some examples are CREATE TABLE AS SELECT, CREATE INDEX, REBUILD INDEX, REBUILD INDEX PARTITION, and MOVE/SPLIT/COALESCE PARTITION. You can normally use parallel data definition language (DDL) where you use regular DDL. There are, however, some additional details to consider when designing your database. One important restriction is that parallel DDL cannot be used on tables with object or LOB columns. All of these DDL operations can be performed in NOLOGGING mode for either parallel or serial execution. The CREATE TABLE statement for an index-organized table can be parallelized either with or without an AS SELECT clause. Different parallelism is used for different operations. The Parallel CREATE (PARTITIONED) TABLE… AS SELECT and parallel CREATE (PARTITIONED) INDEX statements run with a degree of parallelism equal to the number of partitions. Parallel operations require accurate statistics to perform optimally.
• DML statements: Some examples are INSERT AS SELECT, updates, deletes, and MERGE operations. Parallel DML (parallel insert, update, merge, and delete) uses parallel execution mechanisms to speed up or scale up large DML operations against large database tables and indexes. You can also use INSERT … SELECT statements to insert rows into multiple tables as part of a single DML statement. You can normally use parallel DML where you use regular DML. Although DML normally includes queries, the term parallel DML refers only to inserts, updates, merges, and deletes done in parallel.
• Miscellaneous SQL operations: Some examples are GROUP BY, NOT IN, SELECT DISTINCT, UNION, UNION ALL, CUBE, and ROLLUP, as well as aggregate and table functions.
• Parallel query: You can parallelize queries and subqueries in SELECT statements, as well as the query portions of DDL statements and DML statements (INSERT, UPDATE, DELETE, and MERGE).
• SQL*Loader: You can parallelize the use of SQL*Loader where large amounts of data are routinely encountered.

When to Implement and When NOT to Implement Parallel Execution

The benefits of parallel execution can be seen in DSS and data warehousing environments. OLTP systems can also benefit from parallel execution during batch processing and during schema maintenance operations such as creation of indexes. The average simple data manipulation language (DML) or SELECT statements that characterize OLTP applications would not see any benefit from being executed in parallel.

Parallel execution is NOT normally useful for:
• Environments in which the typical query or transaction is very short (a few seconds or less). This includes most online transaction systems. Parallel execution is not useful in these environments because there is a cost associated with coordinating the parallel execution servers; for short transactions, the cost of this coordination may outweigh the benefits of parallelism.
• Environments in which the CPU, memory, or I/O resources are already heavily utilized. Parallel execution is designed to exploit additional available hardware resources; if no such resources are available, then parallel execution does not yield any benefits and indeed may be detrimental to performance.

How Parallel Execution Works

Parallel execution divides the task of executing a SQL statement into multiple small units, each of which is executed by a separate process. The incoming data can be divided into parts (called granules). The user shadow process that is going to execute a query in parallel takes on the role as parallel execution coordinator or query coordinator. The query coordinator does the following:
• Parses the query and determines the degree of parallelism
• Allocates one or two sets of slaves (threads or processes)
• Controls the query and sends instructions to the PQ slaves
• Determines which tables or indexes need to be scanned by the PQ slaves
• Produces the final output to the user
At execution time, the coordinator also performs the parts of the plan that execute serially (such as accessing tables in serial if they are small or have no hint or degree of parallelism set). Ranging is also done serially to determine the ranges of keys to be distributed from producer slaves to consumer slaves who are sorting or otherwise must consume specific ranges of rows.

When using EXPLAIN PLAN with parallel queries, one parallel plan is compiled and executed. This plan is derived from the serial plan by allocating row sources specific to the parallel support in the Query Coordinator (QC) plan. The table queue row sources (PX Send and PX Receive), the granule iterator, and buffer sorts, required by the two slave set PQ model, are directly inserted into the parallel plan. This plan is the exact same plan for all the slaves if executed in parallel or for the QC if executed in serial.


CREATE TABLE emp2 AS SELECT * FROM employees;
ALTER TABLE emp2 PARALLEL 2;

EXPLAIN PLAN FOR
SELECT SUM(salary) FROM emp2 GROUP BY department_id;
SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY());

--------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name     | Rows  | Bytes | Cost (%CPU) |    TQ  |IN-OUT| PQ Distrib |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |          |   107 |  2782 |     3 (34)  |        |      |            |
|   1 |  PX COORDINATOR          |          |       |       |             |        |      |            |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001 |   107 |  2782 |     3 (34)  |  Q1,01 | P->S | QC (RAND)  |
|   3 |    HASH GROUP BY         |          |   107 |  2782 |     3 (34)  |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE           |          |   107 |  2782 |     3 (34)  |  Q1,01 | PCWP |            |
|   5 |      PX SEND HASH        | :TQ10000 |   107 |  2782 |     3 (34)  |  Q1,00 | P->P | HASH       |
|   6 |       HASH GROUP BY      |          |   107 |  2782 |     3 (34)  |  Q1,00 | PCWP |            |
|   7 |        PX BLOCK ITERATOR |          |   107 |  2782 |     2 (0)   |  Q1,00 | PCWP |            |
|   8 |         TABLE ACCESS FULL| EMP2     |   107 |  2782 |     2 (0)   |  Q1,00 | PCWP |            |
--------------------------------------------------------------------------------------------------------

The table EMP2 is scanned in parallel by one set of slaves while the aggregation for the GROUP BY is done by the second set. The PX BLOCK ITERATOR row source represents the splitting up of the table EMP2 into pieces so as to divide the scan workload between the parallel scan slaves. The PX SEND and PX RECEIVE row sources represent the pipe that connects the two slave sets as rows flow up from the parallel scan, get repartitioned through the HASH table queue, and then read by and aggregated on the top slave set. The PX SEND QC row source represents the aggregated values being sent to the QC (Query Coordinator) in random (RAND) order. The PX COORDINATOR row source represents the QC or Query Coordinator which controls and schedules the parallel plan appearing below it in the plan tree.

Degree of Parallelism (DOP)

The parallel execution coordinator may enlist two or more of the instance’s parallel execution servers to process a SQL statement. The number of parallel execution servers associated with a single operation is known as the degree of parallelism. A single operation is a part of a SQL statement such as an order by or full table scan to perform a join on a non-indexed column table. The degree of parallelism applies directly only to intra-operation parallelism. If inter-operation parallelism is possible, the total number of parallel execution servers for a statement can be twice the specified degree of parallelism. No more than two sets of parallel execution servers can run simultaneously. Each set of parallel execution servers may process multiple operations. Only two sets of parallel execution servers need to be active to guarantee optimal inter-operation parallelism.

Parallel execution is designed to effectively use multiple CPUs and disks to answer queries quickly. When multiple users use parallel execution at the same time, it is easy to quickly exhaust available CPU, memory, and disk resources. The default DOP is used when you ask to parallelize an operation but you do not specify a DOP in a hint or in the definition of a table or index. The default DOP is appropriate for most applications.

If no parallel hints are used and there is no default degree of parallelism for the table in the dictionary:
• Execution for that table is serial
• When parallelism is enforced with the ALTER SESSION FORCE PARALLEL … command, the DOP for a SQL statement is determined by the value of the parameter CPU_COUNT. The value of CPU_COUNT is, by default, the number of CPUs on the system and the value of the PARALLEL_THREADS_PER_CPU parameter.
However, the actual number of processes that are used is limited by their availability on the requested instances during run time. The PARALLEL_MAX_SERVERS initialization parameter sets an upper limit on the total number of parallel execution servers that an instance can have.
If a minimum fraction of the desired parallel execution servers is not available (specified by the PARALLEL_MIN_PERCENT initialization parameter), a user error is produced. You can retry the query when the system is less busy.

Parallelization Rules for SQL Statements

A SQL statement can be parallelized if it includes a parallel hint or if the table or index being operated on has been declared PARALLEL with a CREATE or ALTER statement. In addition, a DDL statement can be parallelized with the PARALLEL clause. However, not all of these methods apply to all types of SQL statements. Parallelization has two components: the decision to parallelize and the DOP. These components are determined differently for queries, DDL operations, and DML operations.

To determine the DOP, the Oracle Server looks at the reference objects:
• Parallel query looks at each table and index in the portion of the query that is being parallelized to determine which is the reference table. The basic rule is to pick the table or index with the largest DOP.
• For parallel DML (INSERT, UPDATE, MERGE, and DELETE), the reference object that determines the DOP is the table being modified by an insert, update, or delete operation. Parallel DML also adds some limits to the DOP to prevent deadlock. If the parallel DML statement includes a subquery, the subquery’s DOP is the same as the DML operation.
• For parallel DDL, the reference object that determines the DOP is the table, index, or partition being created, rebuilt, split, or moved. If the parallel DDL statement includes a subquery, the subquery’s DOP is the same as the DDL operation.

Disabling Parallel Execution

You disable parallel SQL execution with an ALTER SESSION DISABLE PARALLEL DML|DDL|QUERY statement. All subsequent DML (INSERT, UPDATE, DELETE), DDL (CREATE, ALTER), or query (SELECT) operations are executed serially after such a statement is issued. They are executed serially regardless of any PARALLEL clause that is associated with the statement or any parallel attribute that is associated with the table or indexes involved.

The following statement disables parallel DDL operations:
ALTER SESSION DISABLE PARALLEL DDL;

You can also use the ALTER statement to change the PARALLEL state of tables and indexes to NOPARALLEL.

Hints for Parallel Execution

PARALLEL: The PARALLEL hint lets you specify the desired number of concurrent servers that can be used for a parallel operation. The hint applies to the SELECT, INSERT, UPDATE, and DELETE portions of a statement as well as to the table scan portion.
NO_PARALLEL: The NO_PARALLEL hint overrides a PARALLEL specification in the table clause.
PQ_DISTRIBUTE: The PQ_DISTRIBUTE hint improves the performance of parallel join operations. The optimizer ignores the distribution hint if both tables are serial.
PARALLEL_INDEX: The PARALLEL_INDEX hint specifies the desired number of concurrent servers that can be used to parallelize index range scans for partitioned indexes.
NO_PARALLEL_INDEX: The NO_PARALLEL_INDEX hint overrides a PARALLEL attribute setting on an index to avoid a parallel index scan operation.

So what is the best technique to determine the degree of parallelism(DOP) to use?

Mr.Kyte‘s answer to this question is; “don’t, let the database do it. automatic tuning (in 9i, the default in 10g) and adaptive multiuser. Just make things “parallel” or use the “parallel hint” without a degree and let the database do it. If you have the resources, it’ll use them. If you have too many concurrent users to do a parallel 8 query, it’ll downgrade it to parallel 4 or maybe just serial.”

Another bad habit is, with hints you may use lots of parallel servers but this will not reduce responce time, usually syncronization betwwen the threads will result more waits and responce time to grow. Before starting a parallel query in the database you can check the available parallel servers by this queries;


-- how many parallel servers can be opened
SELECT NAME, VALUE
FROM v$parameter
WHERE NAME LIKE ‘%paral%max%’;

-- how many parallel servers are being used by whom
SELECT a.qcsid, a.qcserial#, y.osuser, COUNT(*)
FROM v$px_session a, v$session y
WHERE y.sid = a.qcsid
AND y.serial# = a.qcserial#
GROUP BY a.qcsid, a.qcserial#, y.osuser ;

Oracle Database versions 8.1.5 and later we can also parallelize pl/sql functions. The PARALLEL_ENABLE keyword is the preferred way to mark your code as safe for parallel execution. This keyword is syntactically similar to DETERMINISTIC and it is placed after the return value type in a declaration of the function:


CREATE FUNCTION F1 (P1 NUMBER) RETURN NUMBER PARALLEL_ENABLE IS
BEGIN
RETURN P1 * 2;
END;

As a conclusion, to benefit when using parallelism option there are some important considerations, “Parallel <> fast=true, never has been, never will.” Thomas Kyte

Continue reading with Part 2

References Used :
Oracle® Database Application Developer’s Guide – Fundamentals 10g Release 2 (10.2) Chapter 7 Coding PL/SQL Procedures and Packages Parallel Query and Parallel DML
Oracle® Database Concepts 10g Release 2 (10.2) Chapter 16 Business Intelligence Overview of Parallel Execution
Oracle® Database Performance Tuning Guide 10g Release 2 (10.2) Chapter 19 Using EXPLAIN PLAN Viewing Parallel Queries with EXPLAIN PLAN
“Performance in Parallel query” thread on Asktom
“Parallel Query concerns” thread on Asktom

September 28, 2008

On SAGE and Oracle’s new 11g SQL Tuning Workshop Education Content

Filed under: Oracle 11g New Features,Oracle Performance — H.Tonguç Yılmaz @ 5:48 pm

Most probably you think SAGE (this was its project name before Larry’s announcement, much more compact naming so I prefer it :) –Oracle Exadata Storage Server and the HP Oracle Database Machine– will not make a change in your daily life and you maybe unsatisfied with this announcement but I promise to give you something cool whether you are a massive OLAP or an XE developer with this post. :)

But first let me share with you one of my friend’s, who is a warehouse developer, first response to SAGE: “Oh I knew Oracle should work much more faster for years, the fault was not ours you see!” .:) So does these new features promise to reduce the importance of the warehouse development best practices? I really don’t think so..

So on my 22 hours way back home I had a chance to look at this new Oracle University education and I strongly think that this is the best content you may get from a tuning education. First, with this post, let me briefly explain the table of contents and then I think I will be doing some series of blog posts parallel to this content.

If you had 10g’s SQL Tuning Workshop here is not only new features but new chapters in the 11g’s content now, like a whole chapter dedicated to Star Transformation and a very long chapter dedicated to the optimizer operations like access path options of the Oracle database. I tried to motivate developers around me for not doing any SQL Tuning, just do tuning as they are developing and I am very happy to see this chapter as it is a part I always dreamed of. :)

Let me mention another interesting observation of mine, if you attended Jonathan Lewis’s optimizer seminars I think you will also think that the content and even some pictures are very similar in this education to that content, to my understanding this is a kind of a kind approval for people like Mr.Lewis who are trying to improve the communities understanding of the complex technologies like optimizer. No pain no gain right, so I do not think this education now will be easily consumed by starters as it should be.

By the way, as we own one of the largest warehouses in the world and there is a very important possibility that we may be testing and using Exadata and so I may be blogging something more than “reading billions of rows in seconds” at the end of the day ETL(I mean lots of writes), sort operations parallel scans and hash joins are in total what will make the difference since lots of reports(I mean reads) are already tuned to be fast enough by logical design and access path options like partitioning, bitmap indexes and materialized views in today’s optimized warehouses.

August 16, 2008

How to choose which sessions to attend at Oracle OpenWorld 2008?

Filed under: Oracle Events — H.Tonguç Yılmaz @ 7:41 am

My OOW profile has been updated to include access to Oracle Develop. So I begin to use the schedule builder to select my OOW 2008 sessions.

There are a lot of interesting sessions, so I decided to just go over just the Application Development and Datawarehousing groups to narrow my possible interested sessions, but still it was hard for me to choose several session over their alternatives. Just a small note here, creating a personal schedule does not guarantee those session access and it is advised to arrive at least 10 minutes before session start time to ensure the access.

I still will be needing some additional slots for the Unconference tradition, 11g Exam Cram Session at OCP Lounge, ACE hours at OTN Lounge and of course I want to meet with some of my friends from Oracle Commmunity. I feel like Tom Cruise at Mission Impossible but I am still opened for other session advises :)


Sunday, Sep 21

08:30-10:00 	Getting Started with Oracle Fusion Middleware (IOUG)
10:00-11:30 	Oracle Application Express SIG: Ask the Experts (IOUG)
11:45-12:45 	Using Oracle Database 11g and Oracle Application Express to Change Business Practices and Realize ROI 

13:00-14:30 	Oracle Database Vault for SAP (IOUG)
14:30-15:30 	Efficient Schema Design
15:30-17:00 	OCP Lounge - ORACLE DATABASE 11G EXAM CRAM

Monday, Sep 22

09:00-10:00 	UNCONFERENCE - Oracle Application Express Test Drive for DBAs and PL/SQL Developers (Lead: H.Tonguç Yılmaz)
10:00-11:00 	UNCONFERENCE - Capacity Planning on Oracle DB (Ignacio Ruiz)
11:30-12:30 	Growing a Data Warehouse to 50 Terabytes and Beyond 

13:00-14:00 	Faster and Smarter Data Warehouses with Oracle Database 11g (IOUG)
14:30-15:30 	Data Warehousing at 300 Terabytes on Oracle Real Application Clusters
16:00-17:00 	Best Practices for Deploying a Data Warehouse on Oracle Database 11g
17:30-18:30 	Web 2.0 Development with Oracle Application Express 

Tuesday, Sep 23

09:00-10:00 	Inside Oracle Database 11g Optimizer: Removing the Mystery
10:00-11:00 	UNCONFERENCE - Release 12 Apps DBA 101 (Lead: John Stouffer)
11:30-12:30 	Advanced Performance Diagnostics: What the GUI Doesn’t Tell You

13:00-14:00 	Oracle Database 11g: Stories from a Data Warehouse Implementation
14:00-15:00 	UNCONFERENCE - How to use native database REST Web Services in Oracle 11g by Marcelo F. Ochoa
14:00-15:00 	UNCONFERENCE - So, You Want To Be An Oracle ACE? by Dan Norris
17:00-18:00 	Practical Data Warehouse Experiences with Oracle Database 11g

Wednesday, Sep 24

09:00-10:00 	Successfully Developing and Integrating Applications in Oracle Application Express in Oracle E-Business Suite
11:30-12:30 	Oracle Optimized Warehouse Solutions for Sun: Accelerate Your Data Warehouse Implementation 

13:00-14:00 	The Oracle Database 11g Data Warehouse Toolkit (IOUG)
14:30-16:30 	Keynote: Larry Ellison, Oracle and Mark Hurd, HP
16:00-17:00         UNCONFERENCE - App Server basics for the DBA (Lead: Hans Forbrich)
17:00-18:00 	Building Commercial Software-as-a-Service Applications with Oracle Application Express

Thursday, Sep 25

09:00-10:00 	Dispelling Myths About Oracle Application Express (IOUG)
11:00-12:00 	OTN Lounge - Oracle ACE Office Hours in the OTN Lounge
12:00-13:00 	Oracle's New Database Accelerator: Query Processing Revolutionized 

13:30-14:30 	Oracle Business Intelligence Strategy for the Data-Warehouse-Enabled Enterprise
15:00-16:00 	Oracle Real Application Clusters and QLogic InfiniBand: Yahoo! Large-Scale Data Warehouse

Very short time left now, I will try to do my best to have the best at this beatiful city and event and hope to meet you there. :)

ps: Recently I saw that Arup Nanda has started blogging, after reading his OTN and Oracle Magazine articles this is a great news for me. http://arup.blogspot.com/

another ps: OTN at Oracle OpenWorld: A Condensed Guide

Oracle Develop
September 21-23, San Francisco Marriott

OTN Lounge
Open 9 a.m.-5 p.m., September 21-25, Moscone West, Level 3

Oracle OpenWorld Unconference
Open 9 a.m.-5 p.m., September 21-25, Moscone West, Level 3

OTN Night
7:30-11 p.m., September 22, San Francisco Hilton

The Oracle OpenWorld community experience aggregator

January 21, 2008

Oracle OLAP features performance testing Part 1

Filed under: Oracle Business Intelligence,Oracle Performance — H.Tonguç Yılmaz @ 10:26 am

Since I grew up in a small sea village I always loved fishing. But when I was young it was always hard to wait for a fish, so I think I preferred diving and instead of waiting I went after fishes :) After years of OLTP performance testing OLAP performance testing was a similar experience, I had to train myself again for patience, here in Türkiye there is an old saying; “Only a patient dervish attains his wish.”

Before getting into details I have mentioned importance of testing several times, again this time the results proved that somethings even written in the documentation may not be the optimum for your case.

We are one of the largest Telecom operators in Europe and our data warehouse database is of course a big one. We are on Solaris 10 and using Hitachi storage devices with Oracle 9i Release 2. With company’s aggressive marketing needs our data warehouse needed to be re-structured from bottom to top. So this was a great opportunity for me to advice both continuing with 10g Release 2 and using Oracle’s OLAP features like star transformation and materialized views which the development team once upon an Oracle release tested and didn’t like the outcomes.

There was no upgrade need, for the new 10g Release 2 environment and we had the luxury of setting up a completely different database. This was a big advantage of course for both not upgrading a xx TB database, getting rid of the old fragmented dictionary and staring with a fresh one.

Hüsnü, one of the two DBAs, made lots of interesting load tests with Orion and I hope he also will be blogging some of the interesting results he got. Below are some of the important resources helped us during our tests;

Oracle’s Data Warehousing Documentation
Oracle Database 10g: The Top 20 Features for DBAs by Arup Nanda
Oracle By Example Series – Business Intelligence and Data Warehousing
Using Basic Database Functionality for Data Warehousing
Using Oracle’s Materialized Views Capabilities

So the initial step was to setup a test environment for a specific end user reporting need, after getting the example queries for two big fact and several dimension tables, we immediately started to load several partitions of the facts and all data for the dimensions. Our old friend SQL*Loader with its direct path and nologging options was there for us, which I believe one of the best tools Oracle has ever developed. So here are the next steps of the tests briefly for your informations and comments;

a. Collecting Object and System Statistics after the load

With 10g CBO is the only path, RBO is unsupported anymore. So we gathered object statistics at schema, dictionary and fixed table levels;

-- schema level
exec dbms_stats.gather_schema_stats( ownname =>'ABC', degree=>16, options=>'GATHER AUTO', estimate_percent=>dbms_stats.auto_sample_size, cascade=>TRUE, method_opt=>'FOR ALL COLUMNS SIZE AUTO', granularity=>'ALL');
-- dictionary level
exec dbms_stats.gather_dictionary_stats( degree=>16, options=>'GATHER AUTO', cascade=>TRUE);
-- fixed table level
exec dbms_stats.gather_fixed_objects_stats ;

And also collected system statistics;

SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';
exec dbms_stats.gather_system_stats('start');
-- produce some load, for both parallel multi-block and single-block reads
exec dbms_stats.gather_system_stats('stop');
SELECT pname, pval1 FROM sys.aux_stats$ WHERE sname = 'SYSSTATS_MAIN';

After 10g there is an automated scheduler job, since RBO is unsupported, to gather statistics. To be more in control we disabled this job;

SELECT owner, job_name,enabled
FROM DBA_SCHEDULER_JOBS
WHERE JOB_NAME = 'GATHER_STATS_JOB';

BEGIN
  DBMS_SCHEDULER.DISABLE('GATHER_STATS_JOB');
  COMMIT;
END;
/

b. Automatic Storage Management(ASM) and optimizer parameters

Not only it eliminates the performance and budget cost of a file system layer, but especially because of its management advantages ASM with 10g Release 2 was our another test topic. Hüsnü may blog about his Orion test results of ASM versus Veritas file system, here I will only mention that we used external redundancy and set two important ASM parameters as Note:368055.1 advised;

Subject: Deployment of very large databases (10TB to PB range) with Automatic Storage Management (ASM)
_asm_ausize=16777216
_asm_stripesize=1048576

ASM disks are divided into allocation units of 1MB by default, but for data warehouses setting _asm_ausize higher like 16MB is important. Also like statistics for the optimizer there are several important OLAP settings which has to be tuned on 10g Release 2;

SELECT name, value FROM v$system_parameter a
WHERE a.NAME IN ('compatible', 'optimizer_features_enable', 'optimizer_mode', 'parallel_execution_message_size',
        'pga_aggregate_target', 'workarea_size_policy', 'query_rewrite_enabled',
        'query_rewrite_integrity', 'parallel_max_servers', 'disk_asynch_io',
        'db_file_multiblock_read_count', 'star_transformation_enabled')

Here we left behind our 9iR2 parameters and decided to move on with fresh parameters after 10gR2, our strategy was to leave a parameter to its default unless some tests prove that it has to be changed. Hüsnü again made several detailed tests for parameter settings of like db_file_multiblock_read_count and parallel_execution_message_size and we choosed the optimum values from those results.

c. Bitmap indexes and star transformation

After all above initial steps here comes the segment level initial setups. As we know Bitmap İndexes are best for OLAP usage because of their bitmap level(not row level) locking behavior. Also for a better bitmap index the cardinality, the amount of different values of the column, is important.

For Oracle to do star transformation,
1- We created single-column bitmap indexes on fact table’s all dimension keys. The query scenarios filters were all on these bitmap indexed columns on fact table to dimensions.
2- We created unique key constraints on dimension tables’ joined columns.
3- We also created appropriate indexes on dimension filtered columns.

Here we were excited to see if the second large fact table’s id column would be used or not, because of its high cardinality it was guessed to be having higher costs compared to a full-parallel table scan and hash join alternative. Depending on the filtering factors, selecting at most 5% to 10% of the fact, star transformation always performed better than full-parallel table scan and hash join alternative. Also bitmap index on the second fact table’s reference column was used by the optimizer. When the filtering was not selective than full-parallel table scan and hash join took its multi-block read advantage and performed much more better.

Below sql scripts were run several times and we have taken the average values, we needed to write some shell script in order to keep going at nights :)

himalaya@oracle $ nohup sqlplus ../.. @...sql > ...log 2> ...err &

-- inside sql
-- full-parallel table scan and hash join test script
spool ...log
conn ../..
SET LINESIZE 2000
SET AUTOTRACE TRACEONLY
SET TIME ON
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DML ;
alter session set max_dump_file_size=unlimited ;
alter session set timed_statistics = true ;
-- alter session set STATISTICS_LEVEL = ALL ;
-- alter session set "_rowsource_execution_statistics" = true ;
alter session set tracefile_identifier = senaryo1_full ;

-- be sure that this test is not doing star trans. or query rewr.
ALTER SESSION SET star_transformation_enabled=FALSE;
SELECT /*+ norewrite use_hash(cs,ct,tr,ta,pa,sr,dc) parallel(cs) parallel(ct) parallel(tr) parallel(ta) parallel(pa) parallel(sr) pa
rallel(dc) */
/* SENARYO 1 FULL */ ..

-- star transformation test script
spool ...log
conn ../..
SET LINESIZE 2000
SET AUTOTRACE TRACEONLY
SET TIME ON
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DML ;
alter session set max_dump_file_size=unlimited ;
alter session set timed_statistics = true ;
-- alter session set STATISTICS_LEVEL = ALL ;
-- alter session set "_rowsource_execution_statistics" = true ;
alter session set tracefile_identifier = senaryo1_bi ;

-- be sure that this test is not doing query rewr. and star trans. is enabled
ALTER SESSION SET star_transformation_enabled=TRUE;
SELECT /*+ norewrite STAR_TRANSFORMATION */
/* SENARYO 1 BI */ ...

d. Materialized views and query rewrite

And as a second option for read performance we wanted to see was materialized views. Of course since these are the already calculated summaries, they were the best over both star transformation and full access alternatives.

-- materialized view test script
spool ...log
conn ../..
SET LINESIZE 2000
SET AUTOTRACE TRACEONLY
SET TIME ON
SET TIMING ON
ALTER SESSION ENABLE PARALLEL DML ;
alter session set max_dump_file_size=unlimited ;
alter session set timed_statistics = true ;
-- alter session set STATISTICS_LEVEL = ALL ;
-- alter session set "_rowsource_execution_statistics" = true ;
alter session set tracefile_identifier = senaryo1_mv ;

-- force to use mat.view
ALTER SESSION SET star_transformation_enabled=FALSE;
SELECT /*+ rewrite_or_error */
/* SENARYO 1 MV */ ..

d. Microstrategy reporting tool

We use MSTR for the end user reporting needs, so we needed to be sure on MSTR’s behavior after 10gR2. Some questions were, if its driver was up to date and more important since MSTR also does some query rewrite and caching will the same queries from MSTR have the same execution plans like SQL*Plus tests we had, this was our primary concern.

Here we created a database logon trigger for MSTR database user to catch its produced queries to be sure what was going on;

CREATE OR REPLACE TRIGGER trace_trig
AFTER LOGON
ON DATABASE
DECLARE
   sqlstr VARCHAR2(200) := 'ALTER SESSION SET EVENTS ''10046 TRACE NAME CONTEXT FOREVER, LEVEL 8''';
BEGIN
   IF (USER = 'MSTR') THEN
      execute immediate sqlstr;
   END IF;
END trace_trig;
/

And we saw that MSTR was adding some dimension key columns to the original queries which affected materialized view query rewrite option. Also MSTR was having much more waits compared to the simple SQL*Plus client which will be another area to study deeper later.

During this first part I wanted to briefly explain our testing needs, setups and read performance tests. In the second part more important issues will be the topic, the write cost of each option because of daily ETL and data quality updates.

Especially incremental refresh results over MATERIALIZED VIEW LOGs and DBMS_MVIEW.REFRESH will be important since they performed the best results during reading tests in part 1. Here we will also be getting the most out of 10gR2, with DBMS_ADVISOR.TUNE_MVIEW, DBMS_ADVISOR.CREATE_FILE and DBMS_MVIEW.EXPLAIN_MVIEW supplied functionalities.

PS : I also want to advice these three blogs which I think are really good starting points for data warehousing with Oracle;

– The Oracle Sponge – Oracle Data Warehouse Design and Architecture by David Aldridge – http://oraclesponge.wordpress.com/
– Pete Scott’s random notes – The rantings of a middle-aged computer consultant and generally nice person – http://pjsrandom.wordpress.com/
– Rittman Mead Consulting – “Delivering Oracle Business Intelligence” – http://www.rittmanmead.com/blog/

August 27, 2007

Materialized Views – Summary Management with Oracle

Filed under: Oracle Business Intelligence — H.Tonguç Yılmaz @ 4:48 am

In this blog’s Oracle Business Intelligence category I made introduction before to both Parallel Execution and Partitioning two epical and maybe most used two options of Oracle on BIS environments.

Using Materialized Views is a critical performance argument for data warehousing – decision support – reporting needs, they are also widely used to replicate and distribute data. By calculating the answers to the really hard questions and also query rewrite feature, we greatly reduce the load on the system and these pre-computed summaries and would typically be very small compared to the original source data. Query rewrite is a mechanism where Oracle or applications from the end user or database transparently improve query response time, by automatically rewriting the SQL query to use the materialized view instead of accessing the original tables –

Code Listing 121 – Materialized View Basics Demo

A materialized view can be partitioned, indexed, large materialized views in can be created in parallel and you may use Materialized View Logs for incremental refreshes. Also they are very similar to indexes in several ways –
* They consume storage space,
* They must be refreshed when the data in their master tables changes,
* They improve the performance of SQL execution when they are used for query rewrites,
* Their existence is transparent to SQL applications and users,
* With write operations they have performance impact.
Unlike indexes, materialized views can be accessed directly using a SELECT statement. Depending on the types of refresh that are required, they can also be accessed directly in an INSERT, UPDATE, or DELETE statement.

Managing Materialized Views after 10g is much more easier, if you are interested with dbms_advisor.tune_mview – dbms_advisor.create_file – select /*+ REWRITE_OR_ERROR */ .. stuff please check this Arup Nanda‘s article; Oracle Database 10g: The Top 20 Features for DBAs by Arup Nanda – Week 12 – Materialized Views

Also this week in the group oracle-l, which I mentioned earlier in one of my posts, “Materialized View Refresh method has been changed from 9i to 10g” was a hot topic. As a summary; prior to 9i for complete refreshes Oracle did truncate mv and insert /*+ append */ where as after 10g it does delete and normal insert causing performance overhead. This was also discussed on this AskTom thread. As Jared Still mentioned on one of his follow ups, by changing the refresh method to set atomic_refresh = false, the truncate/append behavior can be restored;

begin
— dbms_mview.refresh(‘MVTEST_MV’,method => ‘C’);
— use this with 10g to return to truncate/append behavior
dbms_mview.refresh(‘MVTEST_MV’,method => ‘C’, atomic_refresh=>false);
end;
/

ps: to whom may be interested, as I promised earlier I published my two days Optimizing for Performance seminar by Joze Senegacnik seminar notes.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

Refences Used :
http://www.akadia.com/services/ora_materialized_views.html
Oracle® Database Performance Tuning Guide 10g Release 2 (10.2) – DBMS_ADVISOR.TUNE_MVIEW Procedure
http://psoug.org/reference/dbms_advisor.html

October 2, 2009

Recent changes

Filed under: Oracle Events,Oracle Other — H.Tonguç Yılmaz @ 3:18 am

Now I am managing the BI development team focused on CDR ETL, ODS and Data Mining areas, my team is called Revenue Oriented Business Intelligence, cool naming ha? :) Because of my experiences I prefer the “team leader” attribute instead of “team manager”, this change is a very exciting new experience for me.

Last month I blogged twice at our corporate blog in Turkish: http://blog.turkcell.com.tr/tag/oracle/ But I can easily say that during 2009 I become much more a twitter guy(short and brief feeds) than a blogger, you can follow my feeds at http://friendfeed.com/tongucy

Next week I will be at OOW09, my sessions will be focused on data warehousing and exadata as usual, here is a list of the sessions on this focus area: http://www.oracle.com/ocom/groups/public/documents/webcontent/034315.pdf

Also I will be attending to several Terabyte Club meetings and I will try to do my best to attend to the ACE dinner and Bloggers Meetup this year, OOW is a very good opportunity for networking. So if you will also be around see you at OTN Lounge :)

September 22, 2008

Highlights of my first day at OOW2008

Filed under: Oracle Events — H.Tonguç Yılmaz @ 2:36 am

It took ~22 hours for me to get to Cathedral Hotel on OOW’s Route 2 from my home, everything is cool until now. OOW 2008 is well organized and the city is like Oracle’s home, Oracle is everywhere here and below are the highlights from my first day at OOW 2008.

Sess.1 – Getting started with Oracle Fusion Middleware by George Trujillo

During this presentation I sensed George was trying to underline the next big thing, the change we will soon face as the database developers and DBAs. Call Forms and Reports developers Developer 1.0, then Fusion Middleware developers are the Developer 2.0 era, here key components are XML, SOA, BPEL, Web Services and J2EE. XML is like English, it is the common language all applications must speak in order to integrate with each other in this architecture. Oracle’s Weblogic purchase was an important step, within Oracle Weblogic Application Server SOA Suite, Identity Management, Business Intelligence and User Interface, Portal, ADF, JSF will be the four major areas of interest. All managed with a single point of view, by Grid Control, similar to the RAC environment at the data tier.

To my experience these technologies’ experts all stand very far from the database so if Fusion Middleware and these guys are the future I suspect there will be big threats on building successful database applications, but of course on the other hand for experts who can additionally manage to learn these technologies there will be important opportunities to fix these problems.

Sess.2 – Extending Oracle Application Express Framework with Web 2.0

This was a hands on lab based on OTN OBE’s for Javascript and Ajax usage within Apex 3.1.

Sess.3 – All About Encryption by Tom Kyte

This was another very typical Kyte presentation, with demos on performance and storage effects of the encryption options. Presentation’s focus was three features inside the database;

a. DBMS_CRYPTO: no additional cost, labor intensive, not transparent, not performant when compared to the other two options.

b. Column Level Encryption: Advanced Security Option needed, semi transparent, can not use index with LIKE ‘ABC%’ type of queries, storage cost is higher after encryption, much more performant compared to DBMS_CRYPTO.

b. Tablespace Level Encryption: Advanced Security Option needed, completely transparent, performant on both read and writes and also disk usage doesn’t increase.

With DBMS_OBFUSCATION_TOOLKIT instead of using VARCHAR APIS it is better to convert to RAW first and always return BINARY formatted data. After 11g Data Pump can export encrypted and preserve the protection, leave old EXP it will be depreciated soon, but IMP will stay. Undo, Redo produced and Data cached all are encrypted also, but a last minute question was related to Temp(Sort) data and Kyte promised to blog on this topic. Becareful about the legacy information, old backups for example, encrypting today will not protect the data sitting there.

Sess.4 – Doing SQL from PL/SQL: Best and Worst Practices by Bryn Llewellyn

There is a new OTN paper on this session, Bryn mentioned some very interesting PL/SQL Best Practices during this session. This was my best session for the day.

Sess.5 – 11g New Features Exam Cram

I planned to attend Kyte’s Efficient Schema Design session but changed my mind to move to the Exam Cram, within 8 hours all 5 days 11gR1 NF education topics were discussed, it was cool to attend the last 2,5 hours, but I couldn’t have the booklet prepared for the cram, hope to get one tomorrow.

I had the chance to chat with three Oracle bloggers until now; Tim Hall at the plane, Carl Backstrom and Steven Feuerstein during the sessions today, but now it is time for the bloggers meetup of OOW 2008 but everything is turning around my head tonight after the first 1,5 days, I hope I can make it. :)

ps: I tried to twit my sessions as they happened today and will try to continue this until Thursday, so if you want to follow here is the link; http://twitter.com/TongucY

November 29, 2007

Linux and Shell Scripting Part 2

Filed under: Oracle, Linux and Shell Scripting — H.Tonguç Yılmaz @ 3:08 pm

There are several series I couldn’t get into as much as I wanted last year. BIS, Security, Linux, Apex and Information Integration are some of these, let’s say maybe next year, of course if time remains from 11g new features stuff :) So this one goes to Linux category.

One of my colleagues needed my help on some shell scripting, since I love this I gave a break on my Apex project I mentioned earlier in a post and helped her.

Request was to clone a remote 9i oracle schema into local 10g database and make it read only, a periodically unix cron runnable script is needed. So script should involve four main parts at least;
A. sqlplus to drop and recreate local clone user,
B. export remote schema to local dump file,
C. import remote schema from local dump file,
D. sqlplus to revoke all unwanted grants on the schema, just the SELECT type of privileges may remain.

Here are the scripts we used, you may check them, comment on them and maybe you might want to use these as templates for your needs, because I do so when I start any kind of Oracle shell script need, time is saved :)

clone_user.ksh
step1.sql
step4.sql

Since there are some attractions used, here you will also find my comments on the scripts mentioned above.

comments.txt

During my Apex work last several weeks I benefited so much from OTN Apex forum, very active and effective forum, whenever I faced a problem there I found the exact detailed solution easily. So a bunch of thanks from me goes to below experts for putting their valuable time into all OTN forums, several times they saved me;

Scott Spadafore
Hans Forbrich
Vikas
Nicolas Gasparotto
Hector Rivera Madrid
Rob van Wijk

It is great to feel your support behind guys, take care! :)

Lately I learned that Steven Feuerstein will be in Istanbul for a two days seminar, he subscribed and announced his seminar to OracleTURK group members, it was really fun to monitor the feedbacks of the group members. Including me Steven has lots of fans here in Turkey, so I guess he may have a similar crowd waiting which Roberto Carlos experienced during his first arrival to Atatürk Airport after his transfer from Real Madrid to my team Fenerbahçe this year :)

ps : lately two important names Jonathan Lewis and Howard Rogers are also active at OTN forums, I hinted this one because you also might want to watch their posts over OTN forums rss feeds since usually each post of them is like an article to study(but be warned, if Burleson is included within the thread then things may get a little complicated:).

Another additional hint goes for Dizwell, he has his own way of sharing his experiences, which may help you to extra enjoy reading his long feedbacks:)

March 17, 2007

Some migration experiences to share

Filed under: Oracle How To,Oracle Performance — H.Tonguç Yılmaz @ 3:39 pm

Lets say you have a 7*24*365 available subscription database application, ~31 million customers’ data inside. A new version of the application is designed under the new needs and wants of the marketing team, so a new to-be table design developed, coded and tested.

Now it is the time to migrate the data from as-is table structure to the to-be table structure. Since the operations’ availability is critical you may have a very limited downtime window or maybe none, so you may have to design a migration method which will not require a downtime.

There important experiences I want to mention here related to the topic;

1- Never make assumptions or promises to the internal customer groups from some percent(lets say %20) of data migration timing done on a pre-production or test environment. Oracle may slightly do things different on your production environment, because of your database and especially cost based optimizer and parallelism settings. These differences usually are not in your best interest strangely :)

2- Migration means lots of I/O, in terms of especially redo and undo, so the size of the redo log buffer, redo log files(and numbers) and whether you are archiving or not, will have important impact on the total timing of the migration.

You have to choose a migration methodology which reduces the I/O, and remember the mantra on data intensive processing; Create table as select(CTAS), merge statement or conditional multi-table inserts and dbms_errlog with nologging and parallel options will have great help, but even after you may have to use pl/sql and then prefer bulk operations, trust me on this :)

But remember bulk operations produces redo, nologging is not option;

Code listing 71a : PL/SQL Bulk Insert(FORALL) and Redo amount produced demo

And you know any huge update or delete may be converted into a CTAS with nologging and parallel option;

Code listing 71b : Converting an Update or Delete into a NoLogging CTAS demo

3- Oracle 9iR2 and 10gR2 are very very different two platforms, somethings very common sense may not work for 9i, be careful and check this for example

So waiting your comments and experiences on the topic. Lately I did some research on the “near-real time datawarehousing” concept and fell in love with Oracle’s Change Data Capture feature which is based on Log Miner and Streams technologies. I will be sharing my experiences under my Oracle Information Integration Series.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.1.0

Refences Used :
N/A

Next Page »

Blog at WordPress.com.