Extract(Read) > Transform(Test&Apply) > Load(Learn) Blog

March 31, 2007

Oracle Native Types Part 2

Filed under: Oracle How To — H.Tonguç Yılmaz @ 8:30 pm

The decision of choosing which datatype you will use for your needs will result with more or less flexibility and performance of your database application. This series discusses the native data types of Oracle.

In Part 1 we mentioned;

1- ANYTYPE, ANYDATA and ANYDATASET
2- Binary_double vs. Number in compute intensive processes
3- XMLType and XDB

and on how to give data types to columns with create table as select statement.

4- String data types with Oracle

There are two dominant string native data types, CHAR and VARCHAR2 in Oracle. The CHAR datatype stores fixed-length character strings, and Oracle compares CHAR values using blank-padded comparison semantics. Where as the VARCHAR2 datatype stores variable-length character strings, and Oracle compares VARCHAR2 values using nonpadded comparison semantics.

This variable-length behavior of VARCHAR2 datatype has a pitfall when you want to enlarge lets say a VARCHAR2(80) columns value from 40 characters to 70, and there is no available space left on the Oracle block that row is stored. This concept is called Row Chaining and Migrating. When a row is chained or migrated, since Oracle must scan more than one data block to retrieve the information for the row I/O increases and responce time increases. You can detect migrated or chained rows by checking the number of table fetch continued row statistic in V$SYSSTAT and CHAIN_COUNT columns value on user_tables view. A small number of chained rows(for example less than 1%) is unlikely to impact system performance. However, a large percentage of chained rows can affect read performance since you have to do much more I/O than you should.

There is a second performance problem for VARCHAR2 values, since the datatype has flexibilities to store variable-length values for example lets say “why not storing a “human name” :) on a varchar2(4000) column, what can be the impact since varchar2(40) is already enough for this need?” First of all there some limitations like;


drop table t purge ;
create table t ( a varchar2(4000), b varchar2(4000) );
create index t_idx_ab on t(a, b);

create index t_idx_ab on t(a, b)
*
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

And here is an example of programmatic pitfalls caused by some data processing methods you may use;

Code listing 73 : Memory cost analysis of varchar(4000) column versus varchar(40)

Also be carefull when comparing or joining on the columns having these datatypes;


set serveroutput on
DECLARE
last_name1 VARCHAR2(10) := 'TONGUC';
last_name2     CHAR(10) := 'TONGUC';
BEGIN
IF last_name1 = last_name2 THEN
DBMS_OUTPUT.PUT_LINE ( '-' || last_name1 || '- is equal to -' || last_name2 || '-');
ELSE
DBMS_OUTPUT.PUT_LINE ( '-' || last_name1 || '- is NOT equal to -' || last_name2 || '-');
END IF;
END;
/
-TONGUC- is NOT equal to -TONGUC    -

PL/SQL procedure successfully completed.

DECLARE
last_name1 CHAR(6)  := 'TONGUC';
last_name2 CHAR(10) := 'TONGUC';
BEGIN
IF last_name1 = last_name2 THEN
DBMS_OUTPUT.PUT_LINE ( '-' || last_name1 || '- is equal to -' || last_name2 || '-');
ELSE
DBMS_OUTPUT.PUT_LINE ( '-' || last_name1 || '- is NOT equal to -' || last_name2 || '-');
END IF;
END;
/
-TONGUC- is equal to -TONGUC    -

PL/SQL procedure successfully completed.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Express Edition Release 10.2.0.1.0

Refences Used :
Related Asktom thread
How to install and use runstats package
show_space procedure on Asktom

March 21, 2007

How Oracle optimizer may decide between an index or table scan access path

Filed under: Oracle How To,Oracle Performance — H.Tonguç Yılmaz @ 5:56 pm

Oracle optimizer has always been a major performance test area during migrations. Release to release Oracle changes the code but these changes may have some unexpected outcomes.

Usually the problems are locked at the inefficient queries, but be careful inefficient queries are not the reasons, usually they are the outcomes. The reasons are usually wrong or missing statistics or database parameters.

I prepared a small demonstration based on one of Mr.Jonathan Lewis‘s presentation to show how optimizer is affected from some basic characteristics of your queries like data distribution and database parameters;

Code listing 72 : optimizer decision between index or full table access demo

As a conclusion if you have a performance problem inefficient queries are easy to leave the guilt, but if you are a scientific person you must go some steps further, why are these queries running with inefficient execution plans?

The reason might be related to;
– missing selective indexes ,
– missing or wrong gathered statistics ,
– data distribution or row migration on the table ,
– wrong or default left optimizer database parameters

Here Oracle trace files are your friends, use them, they will show you all information you need during problem investigation;
10046 Event explaination
10053 Event explaination

Understanding the optimizer may take some years and after all still with a new release everything you know must be retested usually :) But if you want to start this long journey here are my advices ;
Watch and study this presentation
Read and study these papers
Follow this blog
Read and study this book

Testing Information : the scripts mentioned are tested on Oracle Database 9i Enterprise Edition Release 9.2.0.7.0

Refences Used :
n/a

March 19, 2007

Oracle Best Practices Part 3

Filed under: Oracle Best Practices — H.Tonguç Yılmaz @ 8:55 am

In this series during Part 1 I mentioned below best practices;
1- Put your Database into ARCHIVELOG Mode
2- Comment on Tables and Columns similar to your PL/SQL or SQL Source Codes
3- Use DBMS_STATS for statistic collection
4- Using Recovery Manager(RMAN) for Backup and Recovery needs

And following Part 2 I mentioned below best practices;
5- Resumable Statements and Space Allocation
6- Maintenance of Global Partitioned Indexes
7- Default Tablespaces
8- PL/SQL Bulk Operations
9- Locking Issues
10- Oracle System Event Triggers
11- Autonomous Transactions

12- Learn what is already provided with PL/SQL language

Each database has its own procedural language and PL/SQL(Procedural Language/Structured Query Language) is Oracle’s server-based procedural extension to the SQL database language, it is Oracle’s native programming language and its syntax strongly resembles that of Ada.

PL/SQL supports variables, conditions, arrays and exceptions etc. also implementations from version 8 onwards have included features associated with object-orientation. PL/SQL stored procedures(functions, procedures, packages, and triggers) which perform data manipulation get compiled into an Oracle database, to this extent their SQL code can undergo syntax-checking and can be shared which is critical for scalability needs. PL/SQL offers several pre-defined packages for specific purposes, which enables developers to answer the time to market needs, no need to reinvent a dumper wheel :) Even some complex administration tasks like scheduling, data load/unload, change data capture configuration, backup metadata or transport tablespaces between Oracle databases are easily done and may be scripted by these supplied packages. Also you can easily develop some web based applications over these packages on apex or .net in minutes :)

PL/SQL is powerful, Oracle’s advanced replication, applications, workflow solutions and lately web based rapid application development platform Application Express were build by PL/SQL. PL/SQL data types are SQL data types, so no conversions needed and you have tight coupling between the language and the database(cursor for loops for example). You are protected from changes in the database, PL/SQL can survive database schema changes such as a varchar2(80) -> varchar2(255) easily by %TYPE configuration, sub-optimizations like cursor caching done for you and in PL/SQL it is very hard to not use bind variables correctly and don’t forget the dependency mechanism you get.

PL/SQL provides full portability, you can start coding at weekend on your windows express edition laptop and port it to your solaris enterprise edition development database on monday, so Oracle easily becomes your virtual machine here :)

After this introduction I want to share my “database independence” point of view. Lets say your customer needs an application which will create, update and query their customers’ information. So some of your functions will be fnc_get_customer_name_by_id, fnc_set_customer_name_by_id etc. Here for the independence need these functions may be implemented at such a layer that is designed for any kind of database, like Java Database Connectivity(JDBC) for instance. But we already know Oracle is not like others, so the question is how to use efficiently and utilize the money purchased for the database software and still be ready to run for other database vendors?

At this point lets assume Oracle database itself as an application designed for our needs, have you ever needed to create, update or query directly the dictionary ever after? Here lies a best practice to investigate, we use supplied packages and DDL scripts to manipulate the dictionary and v$, all% views to query information. So I will advice the “database independence” solution this way, if you want to sell customer friendly products you have to utilize the database application they have invested, use its all available features for performance and security. So start creating procedural APIs, with PL/SQL of course for Oracle, package body’s may be wraped like Oracle does for its packages if you need to close the sources, and call these interfaces from whatever technology you want to use. These may be web services, .net, php, apex, java etc. in time, now you are also technology independent :)

Lets give an example here, with the support of DBMS_SQLTUNE package you may use SQL*Plus, Oracle Enterprise Manager, Toad like IDE for Automatic Sql Tuning need, or even after you can easily prepare an IDE with ApEx or a plugin for Eclipse.

In order to move on to another database vendor you create these get and set database interfaces with their procedural language, take the benefits of the other database language provides. This way of database application development methodology in short run seems to be costly since who as the qualified human resource to write all these database APIs for each vendor, but remember each database is different and if your customer purchased Oracle you have to think about when you are marketing your application as “..this software is so beautifully designed that without any change it can also run on MySQL..”

Before closing I want to mention the power of PL/SQL packages for encapsulation and reusability needs. Before starting to any database application first you need to design your kernel packages like error&exception management, transaction management, log management etc. These might be completely parametric since depending on the needs they change frequently so they need to be easily configurable. Also with packages you have your libraries, so nothing is excepted to be repeating and this favors reusability, code sharing at runtime and scalability. Check PLVision for an example implementation.

As a conclusion I may advice you that choosing the data access layer for your database application projects is very important, PL/SQL for Oracle is inevitable since;
– this is the best language your software will talk,
– encapsulation and packaging of your SQL needs into PL/SQL will be beneficial for your security and performance needs, especially for the scalability,
– the most strategic outcome you will be facing in long-run is while trying to be “database independence” you will exactly become some software house dependent, they will charge you as they like and give unbelievable schedules for any kind of small activity critical your business need, this is their dream come true, they will be the master and you will be the puppet! But if you force them to have this data layer you find yourself building up new applications using the supplied data access interface in the future with any kind of technology you prefer independently from these “jackals”.

Be careful, don’t let any software house to use your company’s resources as a laboratory, since their priority will be write once sell the product as many as possible.

For more reading please see Steven Feuerstein‘s resources. He is an expert on the Oracle PL/SQL language, and the author or coauthor of nine books on PL/SQL, including Oracle PL/SQL Programming and Oracle PL/SQL Best Practices;
Oracle PL/SQL Programming
Best Practice PL/SQL

And an example comparison between a java and a pl/sql data access layer;
SDPA getAllSubsbyMsisdn method.pdf

Continue reading with Part 4

Testing Information : the scripts mentioned are tested on Oracle Database 10g Express Edition Release 10.2.0.1.0

Refences Used :
Oracle® Database Application Developer’s Guide – Fundamentals 10g Release 2 (10.2)
Oracle® Database PL/SQL User’s Guide and Reference 10g Release 2 (10.2)
Oracle® Database PL/SQL Packages and Types Reference 10g Release 2 (10.2)

March 18, 2007

Joe Fuda’s SQL Snippets

Filed under: Oracle How To,Oracle Other — H.Tonguç Yılmaz @ 3:40 pm

I loved the model clause series, check it out; http://www.sqlsnippets.com/en/home.html

March 17, 2007

Some migration experiences to share

Filed under: Oracle How To,Oracle Performance — H.Tonguç Yılmaz @ 3:39 pm

Lets say you have a 7*24*365 available subscription database application, ~31 million customers’ data inside. A new version of the application is designed under the new needs and wants of the marketing team, so a new to-be table design developed, coded and tested.

Now it is the time to migrate the data from as-is table structure to the to-be table structure. Since the operations’ availability is critical you may have a very limited downtime window or maybe none, so you may have to design a migration method which will not require a downtime.

There important experiences I want to mention here related to the topic;

1- Never make assumptions or promises to the internal customer groups from some percent(lets say %20) of data migration timing done on a pre-production or test environment. Oracle may slightly do things different on your production environment, because of your database and especially cost based optimizer and parallelism settings. These differences usually are not in your best interest strangely :)

2- Migration means lots of I/O, in terms of especially redo and undo, so the size of the redo log buffer, redo log files(and numbers) and whether you are archiving or not, will have important impact on the total timing of the migration.

You have to choose a migration methodology which reduces the I/O, and remember the mantra on data intensive processing; Create table as select(CTAS), merge statement or conditional multi-table inserts and dbms_errlog with nologging and parallel options will have great help, but even after you may have to use pl/sql and then prefer bulk operations, trust me on this :)

But remember bulk operations produces redo, nologging is not option;

Code listing 71a : PL/SQL Bulk Insert(FORALL) and Redo amount produced demo

And you know any huge update or delete may be converted into a CTAS with nologging and parallel option;

Code listing 71b : Converting an Update or Delete into a NoLogging CTAS demo

3- Oracle 9iR2 and 10gR2 are very very different two platforms, somethings very common sense may not work for 9i, be careful and check this for example

So waiting your comments and experiences on the topic. Lately I did some research on the “near-real time datawarehousing” concept and fell in love with Oracle’s Change Data Capture feature which is based on Log Miner and Streams technologies. I will be sharing my experiences under my Oracle Information Integration Series.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.1.0

Refences Used :
N/A

March 9, 2007

RAC & Dyke – two new great presentations!

Filed under: Oracle Other — H.Tonguç Yılmaz @ 8:40 pm

Mr. Julian Dyke announced two new presentations on his web site, be sure you are following his presentations on slide show mode ;)

Transaction Internals This presentation describes the internal mechanisms provided in Oracle to support transactions, including undo and read-consistency. It also examines new undo-based features such as flashback and row dependenies. It is based on “Further Inside RAC” which Mr.Dyke presented at the Hotsos Symposium 2007 in Dallas.

RAC Internals This presentation describes how the read-consistency mechanism is implemented in a RAC environment. It is again based on “Further Inside RAC” which Mr.Dyke presented at the Hotsos Symposium 2007 in Dallas.

March 8, 2007

Oracle TimesTen In-Memory Database-Microseconds Indeed Counts

Filed under: Oracle How To — H.Tonguç Yılmaz @ 1:40 pm

Today I downloaded and installed Oracle TimesTen 7.0 In-Memory Database on Linux by following these presentations;

How to install Oracle TimesTen 7.0 on Linux
How to create an Oracle TimesTen 7.0 Database on Linux

and this documentation; TimesTen 7.0 Installation Guide

Code listing 70 : TimesTen Installation Notes

First benchmark results using the demo C code tptbm supplied by Oracle are very intresting, remember this is just an Oracle Enterprise Linux installed desktop pc with a single cpu after all :)

[oracle@tcellhost performance]$ ./tptbm -proc 2 -read 85 -insert 10 tptbmdata_tt70
Connecting to the data source
Populating benchmark data store
Waiting for 2 processes to initialize
Beginning execution with 2 processes: 85% read, 5% update, 10% insert

Elapsed time: 1.4 seconds
Transaction rate: 14673.5 transactions/second
[oracle@tcellhost performance]$ ./tptbm -proc 4 -read 85 -insert 10 tptbmdata_tt70
Connecting to the data source
Populating benchmark data store
Waiting for 4 processes to initialize
Beginning execution with 4 processes: 85% read, 5% update, 10% insert

Elapsed time: 1.8 seconds
Transaction rate: 21990.1 transactions/second
[oracle@tcellhost performance]$ ./tptbm -proc 4 -read 60 -insert 20 tptbmdata_tt70
Connecting to the data source
Populating benchmark data store
Waiting for 4 processes to initialize
Beginning execution with 4 processes: 60% read, 20% update, 20% insert

Elapsed time: 2.2 seconds
Transaction rate: 18248.2 transactions/second

Next steps include replicating data between Oracle and TimesTen over Global Cache Connect option and of course for persistency stress shutting down the electricty of pc during load :)

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.1.0

Refences Used :
Oracle Magazine Article “When Microseconds Count” by Mr.Jonathan Gennick
Oracle TimesTen In-Memory Database Product Center

March 6, 2007

Jonathan Lewis’s Hotsos 2007 event presentation

Filed under: Oracle How To,Oracle Performance — H.Tonguç Yılmaz @ 11:45 am

Mr. Lewis announced this presentaion on his blog with this introduction; “..distecting one query to see how much you can learn about how the optimizer works, and how you have to think when dealing with a difficult problem in SQL..”

Optimizing through understanding

March 4, 2007

Oracle Information Integration Part 1

Filed under: Oracle Information Integration — H.Tonguç Yılmaz @ 2:38 pm

I am working on the change data capture feature of Oracle for our Extract Transform Load(ETL) process. With this feature we may be able to only apply the daily changes from our operational databases to our data warehouse’s staging area. But before that I must negotiate with the operational database’s admins on how much cost this feature will bring to their databases.

In order to do that I plan to produce statspack reports(also 10046 trace files if needed) and compare before and after setup conditions, as statspack and 10046 are standard all excepted performance tools I think decision between doing this extract-transform-load process daily with whole tables and calculating the differences from the previous days whole data on the staging area or doing it with this change data capture feature will be less debatable.

So I started to do some researches on Oracle’s Streams Advanced Queuing(AQ) option since Logminer and AQ is the two most important Oracle features behind this technology. As a starting point I wanted to compare the performance of a Do It Yourself(DIY) queue to Oracle’s Advanced Queue(AQ). AQ has a long history and a much more stable product compared to other options, also has lots of great configure and use features like timeout, delay, multi-consumer and priority. But if you really do not need these features and only your need will be to push and pull without any of these options, is AQ still a good choice for you?

I did a similar research for this question on 8iR2 several years ago and we choosed to use DIY table(an index organized table(IOT)) from the performance results. So the time has come to redo this comparison on 10gR2. Below you will find the setup scripts for both aq and diy-iot queue tables and packages. Also functional and benchmark test scripts are provided, so please try them and give me feedback if I am missing something;

Code listing 69a : DIY queue with IOT setup
Code listing 69b : AQ Setup
Code listing 69c : Responce time and latching results SINGLE based
Code listing 69d : Responce time and latching results ARRAY based

In order to eliminate the caching affect I repeated the tests for several times;
For single runstats AQ for 1000 enqueue and dequeue operations finished within 336 hsecs where as
for single runstats DIY-IOT-Q for 1000 enqueue and dequeue operations finished within 166 hsecs.
~202% of response time.

Also when we look at the latching activity of the two options which is very critical in a concurrent environment;
Again for single runstats DIY-IOT-Q for 1000 enqueue and dequeue operations against AQ had
~154% of latching amount. Dominant difference comes from these three Oracle events; undo change vector size, redo size, IMU undo allocation siz

So the second comparison was on array(bulk) enqueueing and dequeueing;
For array runstats AQ for 1000 enqueue and dequeue operations of a 10 rows array finished within 1242 hsecs where as
for single runstats DIY-IOT-Q for 1000 enqueue and dequeue operations of a 10 rows array finished within 1412 hsecs.
~88% of response time.

Also again when we look at the latching activity of the two options which is very critical in a concurrent environment;
Again for array runstats DIY-IOT-Q for 1000 enqueue and dequeue operations of a 10 rows array against AQ had
~81% of latching amount. This time dominant difference comes from these Oracle events; session uga memory, session pga memory max, session pga memory, IMU Redo allocation siz, undo change vector size, IMU undo allocation siz, redo size

So what are the next steps, first of all I hate do it yourself(diy) approaches since using what is already available, tested and paid a lot is much more efficient. Also there is no need to reinvent a dumper wheel, Oracle gives support for its AQ and there are a lot of great features you may easily configure and use depending on your customers changing needs.

From my primitive loop testings still AQ with bulk option is better, but this decision must be made on the results that are to be monitored under load, not just a primitive isolated database test. So here is a good starting article on how to load Oracle to follow up. Also doing some 10046 research on how to decrease the waits found on bulk AQ option should be very useful.

I always try NOT to be one of those “Question Authorities” you may easily find from a search engine, so please use the test cases provided and contribute to this peer group review.

Testing Information : the scripts mentioned are tested on Oracle Database 10g Express Edition Release 10.2.0.1.0

References Used :
http://www.psoug.org/reference/bulk_collect.html
Oracle® Streams Advanced Queuing User’s Guide and Reference 10g Release 2 (10.2)

March 1, 2007

Another great undocumented Oracle hint

Filed under: Oracle How To,Oracle Seminars — H.Tonguç Yılmaz @ 9:00 pm

Our master was in Istanbul for a two day seminar and he advised us this hint, please check out this clever trick :)


set linesize 121
set timing on

drop table t1 purge ;
create table t1 nologging as select a.* from all_source a union
select b.* from all_source b union select c.* from all_source c ;
create index nui_t1 on t1(owner) nologging ;
exec dbms_stats.gather_table_stats(USER, TABNAME=>'T1', CASCADE=>TRUE);

-- this is not a caching trick, so you may want to flush the data buffers between two
SQL> SELECT count(DISTINCT line)
2  FROM t1 WHERE owner = 'SYS' ;

COUNT(DISTINCTLINE)
-------------------
6440

Elapsed: 00:00:04.11

-- only suggested with local XE test database
-- alter system flush buffer_cache ;

SQL> SELECT /*+ go_faster */ count(DISTINCT line)
2  FROM t1 WHERE owner = 'SYS' ;

COUNT(DISTINCTLINE)
-------------------
6440

Elapsed: 00:00:00.31

Oh come on, we all know that there is no FAST=TRUE type of parameters in real life so here is what is going on behind;

Code listing 68 : Outline Example

I had great time during this two day seminar of Mr.Jonathan Lewis, we all know about his knowledge but he is also a great teacher. Here are some highlights from my notes;

- the cpu cost improvements in the algorithm of Oracle’s on in-memory sort operations,
– alter session set “_optimizer_ignore_hints”=TRUE;
– select /*+ opt_param(‘parallel_execution_enabled’, ‘false’) */ ..
– materialize hint creates a global temporary table, efficient to use with WITH clause
– plan table is a global temporary table after 10g, which is very cool indeed
– dump all event 10132 traces of your queries before a migration and use them as a library to check for the problems that may occur after migration
– “there is no complicated execution plans, there are just long ones” so divide them into pieces
– be careful with “row source” and “execution plan” in tkprof analysis, execution plans with row source headings are the ones which really happened to be in the trace file
– push_subq hint to force to push the subquery first during execution
– no_unnest hint with correlated subquery problems
– swap_join_input(tab) hint to swap the order of the tables going through the join
– if you use high freelists(also ASSM) for insert performance know that your clustering factor will get damage(like reverse index case, range scans may have problems with using indexes)
– alter table fk_table disable table lock; command will produce an error for a locking need dml on child, there will not be any lock problem even if the fkey column is not indexed
– do not use and teach update table set row… feature since this way you may be doing unnecessary updates
– after 9iR2 select for update lock mode increased to 3 from 2
– do not believe in myths and rebuild indexes, check coalesce option for old deleted values if really needed
– if you really know what the statistics will be dont collect them, just go and set them with dbms_stats.set_.. functions. statistics gathering is something very resource consuming, so you may also go to your pre-production system and collect statistics to see what may be the statistics to set them
– dynamic_sampling hint is good for bis systems, parse time is discard able compared to execution time

and these ones are from last week;

- rac option adds %50 cost to license with enterprise edition where as after 10gR1 up to 4 cpu with ASM you don’t pay any with standard edition. still additional redundant hardware resource is needed.
– standby database doubles the licence since it must be also fully licensed,
– development environment is free for rac if you are some how able to develop on just one server and one developer :)

Thank you Julian for these RAC configuration hints :)

Testing Information : the scripts mentioned are tested on Oracle Database 10g Express Edition Release 10.2.0.1.0

Refences Used :
http://www.psoug.org/reference/outlines.html
Using Plan Stability to Preserve Execution Plans
Using Optimizer Hints
Oracle’s corporate pricing lists

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 80 other followers