Extract(Read) > Transform(Test&Apply) > Load(Learn) Blog

May 18, 2009

“RAC Aware Software Development” Discussion

Filed under: Oracle Best Practices — H.Tonguç Yılmaz @ 8:32 pm

If you ask this question(is there a concept like RAC Aware Software Development?) to Oracle employees since RAC is positioned to be completely application transparent they will not accept this concept. So this is another chance for me to have the comfort of not being an Oracle employee :)

First of all, you pay additionally for RAC option, so it will of course be a rational behaviour to get out of it. Also RAC is only an Oracle database tier HA solution(and it provides features to help application HA). HA on the other layers should also be analyzed and solved, so an important question to answer is “How long can you tolerate a service interruption? How much money do you loose when your service is down n minutes?”. Because the resulting architecture can change dramatically according to the answer of this question and each HA solution can provide a certain amount of availability, these all come with a certain cost. This costing is primarily a business decision rather than a technical decision, technical side is easy once the business decision is made and the budget is accepted.

Another important question to answer is “Why do need RAC?”, most possible answers are High Availability and/or Horizontal Scalability needs, and these are really strong motivations if you really need them so as a result you may become willing to have this major change/migration on your environment. But what if all your migration efforts may end up with a failure, meaning if your applications still fail when a node fails and/or your applications can not scale with multiple nodes? This kind of a result may harm your position in your organization if you are the trusted Oracle guy around there, so it is important to communicate the positioning of this Oracle option for your environment with your managers.

It is always best to experience problems or not expected behaviour during testing phase, not production where you have your on going business. But the hardest part of testing is to create a workload similar to the production load and your test environment-scenarios must be as close as possible to the real workload. To achieve this RAT(Real Application Testing) option(11g new feature) provides a way to capture&run the real workload onto a test system, but depending to your application(like lots of distributed Oracle databases calls over dblinks) the product may not be mature enough yet.

Oracle believes that a single node environment can not be more high available than its RAC alternative even without any application change, but for me this comment is something to consume carefully. The internal mechanisms that play a role in a RAC environment is more complex than a single instance environment, as a result if the DBA group responsible to manage this new environment is not experienced or is afraid of it this will cause additional downtime(I have experienced this scenario several times with ASM and RMAN). Also Oracle believes that most of the problems related to RAC migrations are caused by misconfiguration or wrong sizing of hardware and/or software, but we also experienced at a recent 11g 5 node Linux consolidation project that even if the sizing was more than enough and all best practices were followed you still may hit bugs. When these problems popped out our project was at production, we contacted with the Oracle developers immediately and got fixes in one day after the diagnosis(we were not able to capture these bugs during testing since RAT(Real Application Testing) had also several bugs). We all need to accept that there can always be bugs or limitations on any kind of software, even it is Oracle database :)

Also not specific to RAC migrations, any big change(like changing OS version, adding new hardware, changing driver version etc.) in a system can magnify the existing problems. So one of the initial steps of a RAC migration project may be checking tuning possibilities(like SQL query tuning, avoid parsing overhead, partitioning, indexing and clustering etc. for high load SQLs) for the single node environment. All these strategies and as a result some additional development and testing will decrease the interconnect traffic dramatically.

In order to increase your applications'(not database) HA you need to develop special connection methods and exception handling so that you can understand a node failure and refresh connection pools at mid-tier for example. But these exceptions you may manage may not be available to your technology stack if you are not using Oracle JDBC, OCI or ODP.NET drivers.

Conclusions; if you follow shared best practices(I shared my favorite links below) and architect your systems according to your needs, have careful load testing, you will minimize the risk of having problems. Also having a close relationship with Oracle is an important success factor here. As a result to my perspective, in order to get most out of RAC option you need to develop RAC aware software! :)

Oracle Real Application Clusters Sample Code

Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
Chapter 6 Introduction to Workload Management

OC4J Data Sources: Implicit Connection Caching and Fast Connection Failover by Frances Zhao

Oracle RAC Tuning Tips by Joel Goodman

Understanding RAC Internals by Barb Lundhild

Advertisements

February 22, 2009

On Troubleshooting Oracle Performance book by Christian Antognini

Filed under: Oracle Best Practices — H.Tonguç Yılmaz @ 9:01 pm

I want to advise Christian Antognini‘s book Troubleshooting Oracle Performance to the newbies and especially Oracle database application developers. For years I advised Efficient Oracle by Design by Thomas Kyte to the beginners because of similar reasons and now this book also has lots of practical hints and additionally lots of 11g new features are mentioned.

Especially I loved Chapter 9 Optimizing Data Access, 11 Beyond Data Access-Join Optimization and 12 Optimizing the Physical Design. The last two chapters has important performance tips for the Oracle database application developers, for example column retrieval cost in CBO calculation and the following example on best practices in datatype selection are in my opinion some of the most simple and discarded ones. Below example on the NUMEBR dataype shows how the same value may have different scales and since the main datatype used to store floating-point numbers and integers is NUMBER with Oracle database applications this kind of tips are really important to be aware of.


set numwidth 18

CREATE TABLE t (n1 NUMBER, n2 NUMBER(*,2));
INSERT INTO t VALUES (1/3, 1/3);
COMMIT;
SELECT * FROM t;

                N1                 N2
------------------ ------------------
,33333333333333333                ,33

SELECT vsize(n1), vsize(n2) FROM t;

         VSIZE(N1)          VSIZE(N2)
------------------ ------------------
                21                  2
				

Additionally related to the NUMBER datatype, it is mentioned not to be efficient when supporting number-crunching loads and as of 10g two new datatypes are available, BINARY_FLOAT and BINARY_DOUBLE which implement the IEEE 754 standard so a CPU can directly process them and they are fixed-length. Please check Binary_double vs. Number in compute intensive processes post for an example how this decision may affect the performance.

So if you also find yourself requested tuning of applications on production databases again and again like me, I strongly believe that it will be a very wise action in long term to let the Oracle database application developers around you to know about this book and make sure they read and understand the last two chapters so that they can be aware of the different datatypes, table types and index types options with Oracle database during the design and implementation of their database applications.

December 29, 2008

Using Pipelined Table Functions for Siebel’s Audit_Read Data Transformation

It has been a while, I thought it is time that I should write on something? :)

25-12-2008 was my second anniversary on this blog: 233 posts on 18 categories resulted 383 approved comments and ~330,000 page views and lots of good friendships until now. So many thanks to all who contributed!

I was very surprised that I didn’t write on one of my best features of Oracle until today, the Pipelined Table Functions. Beginning with the 9i days I benefited from Pipelined Functions for lots of complex transformations needs.

Here with this post I will share a recent example need to demonstrate this feature. Siebel as a source system for our warehouse has a logging table called S_AUDIT_READ, this table has a CLOB column AUDIT_LOG which holds a string to be parsed with the rules mentioned in the below link.

How to parse audit log value (Doc ID 555553.1)

It can be possible to parse the string in several methods as usual, with even a single SQL of course. But PL/SQL is more readable/maintainable compared to SQL, so Pipelined Table Fuctions is always a good alternative to transform a massive data, in terms of performance also. In the below demonstration I also got rid of the algorithm mentioned above because of the cases in our data.

Siebel’s Audit_Read Data Transformation with Oracle’s Pipelined Table Functions

With the help of Pipelined Table Functions while reading the source data in parallel, it is possible to transform and insert to target table through pipes. For more examples please visit:

http://www.oracle-developer.net/display.php?id=207

http://psoug.org/reference/pipelined.html

http://www.oracle.com/technology/products/oracle9i/daily/may30.html

Just before closing, I wish a very happy new year to all of us! :)

June 25, 2008

Additional Backup Needs and Some Thoughts Which May Help

Filed under: Oracle Best Practices,Oracle How To — H.Tonguç Yılmaz @ 7:57 am

If you are a DBA of a mission critical database which usually mean there is no tolerance of downtime, data loss and mean time to recover should be minimum, ARCHIVELOG mode and a RMAN hot backup will be a mandatory starting point for a strategy.

The easiest way of scheduling this kind of a backup is done by Database(or Grid) Control, after 10.2 using Oracle’s advised backup strategy option which is a Flash Recovery Area based incremental RMAN backup will be what I advice if you already have enough disk space for FRA. Lutz is not blogging for a while and I miss Lutz’s RMAN posts so I guess this one can be another good quick guide.

Why restore before recover? The new Oracle 10g Backup Strategy

But what if you only want to restore a parameter table with lets say 86 rows, or an apex or some pl/sql application’s previous release, or just a metadata of a simple database object which is now changed or dropped? Here we all have our own workarounds mostly depending on the database version and tools we use around the database. Data Guard and Flashback options if you do have may help or you can customize your Backup strategy for these kind of alternative needs.

In order to restore a parameter table with 86 rows it should be throwing a huge rock to a frog if you go with your RMAN backup, so choosing these kind of parameter tables and including them into a daily binary table level export may be the quickest way to revert them back.

What about a PL/SQL based need, here again a daily norows binary export backup may help a lot.


A data-pump norows backup example

# CREATE or REPLACE DIRECTORY backup_dir AS '/ods01/backup/daily_norows/dp';
/dwh0/app/oracle/product/11.1.0.1/bin/expdp $BACKUP_USERNAME/$BACKUP_PASSWORD 
DUMPFILE=daily_full_norows_dp_backup.dmp.$BACKUP_DATE LOGFILE=daily_full_norows_dp_backup.log.$BACKUP_DATE 
DIRECTORY=backup_dir FULL=y EXCLUDE=TABLE_DATA

A traditional export norows backup example

/dwh0/app/oracle/product/11.1.0.1/bin/exp $BACKUP_USERNAME/$BACKUP_PASSWORD 
FILE=/ods01/backup/daily_norows/exp/daily_full_norows_dp_backup.dmp.$BACKUP_DATE 
LOG=/ods01/backup/daily_norows/exp/daily_full_norows_dp_backup.log.$BACKUP_DATE 
FULL=y ROWS=n
   

And what if you are using Apex applications or Warehouse Builder repository, then again taking daily tool level export backups will help.

OWB 11g Repository Shell Script Backup Example

There can be additional thoughts of course, like saving database structure as a SQL file daily or immediately after any structure change.


A Simple Backup Control File to Trace Shell Script Example

/dwh0/app/oracle/product/11.1.0.1/bin/sqlplus -s > /ods01/backup/daily_norows/backup_control_file_to_trace.log.$BACKUP_DATE 
2> /ods01/backup/daily_norows/backup_control_file_to_trace.err.$BACKUP_DATE <<EOF
$BACKUP_USERNAME/$BACKUP_PASSWORD
@/ods01/backup/daily_norows/backup_control_file_to_trace.sql $BACKUP_DATE 
exit ;
EOF

fatih@oracle $ cat /ods01/backup/daily_norows/backup_control_file_to_trace.sql
alter session set tracefile_identifier = backup_controlfile_to_trace;
alter database backup controlfile to trace;
SET ECHO OFF              
SET FEEDBACK OFF
SET PAGESIZE 0
SET LINESIZE 1000
spool /ods01/backup/daily_norows/backup_control_file_to_trace.name.&1
-- available after 11.1
select value from v$diag_info where name like 'Default Trace%' ;
spool off
SET FEEDBACK ON
SET PAGESIZE 24
SET ECHO ON
   

But I think critical point of all is that whatever you do do not forget to use a scheduler to automatize these jobs and control the logs of them periodically :)


Simple unix cron settings to automatize daily backups

fatih@oracle $ crontab -l
40 17 * * * /ods01/backup/daily_norows/daily_norows.sh > /ods01/backup/daily_norows/daily_norows.log 
2> /ods01/backup/daily_norows/daily_norows.err
00 18 * * * /ods01/backup/backup_repos.sh > /ods01/backup/backup_repos.log 
2> /ods01/backup/backup_repos.err
   

May 5, 2008

Data Guard Redo Apply and Media Recovery Best Practices 10g

Filed under: Oracle Best Practices,Oracle Performance — H.Tonguç Yılmaz @ 7:11 am

This morning I was requested to check a 10gR2 physical standby instance which was slow to apply redo and the lag was growing rapidly, my initial check showed me 9 to 12 minutes duration for 250 MB redo log file to apply;


Mon May  5 06:03:17 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902209.arc
Mon May  5 06:13:20 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902210.arc
Mon May  5 06:23:07 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902211.arc
Mon May  5 06:32:51 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902212.arc
Mon May  5 06:42:40 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902213.arc
Mon May  5 06:52:01 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902214.arc
Mon May  5 07:00:52 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902215.arc
Mon May  5 07:09:58 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902216.arc
..
    

After a quick look from unix SQL*Plus as sysdba the Top 5 waits were;


PX Deq: Test for msg   (4,5x amount)                                            
free buffer waits                                                  
log file sequential read                                           
PX Deq Credit: send blkd                                           
checkpoint completed                               
     

And after going through the Note:387343.1 and Data Guard Redo Apply and Media Recovery Best Practices 10gR2 I recommended to increase the buffer cache from 512 MB to 6 GB, reduce the recovery parallelism from 20 to 8 and also some additional parameters like;


*.db_cache_size=6G               
*.log_buffer=78643200
*.large_pool_size=2G 
*.pga_aggregate_target=10G
*.workarea_size_policy='AUTO'
*.parallel_min_servers=360
*.parallel_max_servers=520
*.parallel_threads_per_cpu=2
*.optimizer_mode=ALL_ROWS 
*.optimizer_dynamic_sampling=4
*.cursor_sharing=EXACT

RECOVER MANAGED STANDBY DATABASE PARALLEL 8;  
     

After the restart as I monitored the apply process from the alert.log again, now the database was able to apply 250 MB redo log file within 30 seconds and 1 minutes;


Mon May  5 09:01:07 2008
ALTER DATABASE RECOVER  automatic from '/assos_arcdir' standby database PARALLEL 8  
Mon May  5 09:01:07 2008
Media Recovery Start
Managed Standby Recovery not using Real Time Apply
 parallel recovery started with 8 processes
Mon May  5 09:01:52 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902217.arc
Mon May  5 09:02:29 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902218.arc
Mon May  5 09:03:41 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902219.arc
Mon May  5 09:04:39 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902220.arc
Mon May  5 09:05:33 2008
Media Recovery Log /assos_arcdir/arch_rtx_1_445653769_0000902221.arc
..
     

And the Top 5 waits were now;


PX Deq: Test for msg   (x amount)
log file sequential read 
checkpoint completed 
db file parallel write 
latch: checkpoint queue latch
    

So as my lessons learned, I really want to highlight two things, yes again and again these two;

1- If you have enough physical memory, benefit from it to the ends, do not run your production databases with small SGA(critical for OLTP) or PGA(critical for OLAP) especially after 9iR2(nearly no latching problem like the old days with large SGAs). Come on even on laptops we use some GBs of sized SGAs nowadays.

2- Parallelism does not mean faster, I benefit from reducing the degree of parallelism most often than compared to increasing it, 4 or 8 is nearly anytime is enough for my needs. I hope Oracle invests more in automatic parallelism tuning feature in the near future which is FALSE by default on 10gR2 and like automatic undo or automatic pga management also automatic parallel tuning takes its place in my most liked automatic tuning Oracle features :)

Here are some of the sql scripts I used during above troubleshooting from sql*plus, like in above standby case sometimes we only have sql*plus and some simple v$ queries to gather required troubleshooting information.

May 4, 2008

Revisiting data migration best practices for Oracle database

Filed under: Oracle Best Practices,Oracle How To — H.Tonguç Yılmaz @ 7:57 am

Each year I find myself in a project which involves a huge data migration or database upgrade somehow. So up to now I blogged several times on my previous experiences;

https://tonguc.wordpress.com/2007/07/13/i-am-back/
https://tonguc.wordpress.com/2007/08/31/some-best-practices-of-upgrading-oracle-database/
https://tonguc.wordpress.com/2007/07/16/deffensive-upgrade-methods-but-still-no-pain-no-gain/
https://tonguc.wordpress.com/2007/04/24/take-the-risk-and-migrate-10gr2/
https://tonguc.wordpress.com/2007/03/17/some-migration-experiences-to-share/

For a quick summary on migration best practices I may advise you to take close attention on your SGA and PGA memory pools tuning, keep target database on NOARCHIVELOG mode with big(I prefer 10GB at least) single membered 10 or more redo groups during data transfer at target and test your migration setup several times until you are satisfied with the results. When you are within the expected time window leave it, tuning is an endless road do not let to loose control :)

This week we needed a strategy to easily switch between migration(much more OLAP) and normal(much more OLTP) parameters. So I choosed to use pfile over spfile for this kind of a need and created a special OLAP tuned parameter file and an OLTP pfile from current spfile. As a result a normal startup will open the instance with the original OLTP based parameters which comes from the pfile under $ORACLE_HOME/dbs

— clean shutdown and open with OLTP like default parameter file


SQL> shutdown immediate

SQL> startup

And whenever we need OLAP like migration parameters we will close the related instance and open with the special tuned OLAP pfile.

— clean shutdown and open with OLAP like parameter file


SQL> shutdown immediate

SQL> startup pfile=/../../FOR_MIGRATION_USE_ONLY_PFILE_BSCS.ora

In this pfile I used IFILE option of parameter file, so that I make sure I take all mandatory parameters like destinations first from the original OLTP like pfile and override only the ones needed for OLAP like settings. Below is a sample I share with you which was designed on a 84 dual core CPU, ~200GB physical memory, 10.2.0.3 IBM AIX instance.

FOR_MIGRATION_USE_ONLY_PFILE_BSCS.ora

So of course any time you need a similar OLAP like parameter file you need to customize and test the SGA, PGA and parallelism parameters depending to the instance’s hardware and Oracle release you will be working.

Reference Guide of your release and this query for hidden parameters may assist you during customization.

— how to query hidden parameters’ information based on x$ views


select a.ksppinm  "Parameter", b.ksppstvl "Session Value",
       c.ksppstvl "Instance Value", ksppdesc "Desc"
  from x$ksppi a, x$ksppcv b, x$ksppsv c
where a.indx = b.indx and a.indx = c.indx
--   and ksppinm in ('_smm_px_max_size', '_smm_max_size', '_pga_max_size')
 order by a.ksppinm;

January 29, 2008

Good SQL Practices Videos by Stephane Faroult

Filed under: Oracle Best Practices — H.Tonguç Yılmaz @ 5:19 pm

This email is from Oracle-L list and I really loved the videos, educative, also funny examples and images are used so I wanted to share;

On Jan 27, 2008 4:03 PM, Stephane Faroult wrote:

> I have uploaded to Youtube (in 3 parts) a video derived from a
> presentation I did to some IT managers several months ago about “Good
> SQL practices”. It’s a kind of experiment for me, there may be others in
> the future.
> Enjoy.
>
> Part 1: http://www.youtube.com/watch?v=40Lnoyv-sXg
> Part 2: http://www.youtube.com/watch?v=GbZgnAINjUw
> Part 3: http://www.youtube.com/watch?v=y70FmugnhPU
>
> Stephane Faroult

Thank you Mr.Stephane Faroult for your efforts :)

December 19, 2007

Alejandro Vargas Oracle Maximum Availability Architecture Seminar Notes

Filed under: Oracle 10g New Features,Oracle Best Practices,Oracle Seminars — H.Tonguç Yılmaz @ 8:51 pm

Oracle Maximum Availability Architecture(MAA) is a set of high availability best practices which targets to minimize the complexity in designing the optimal high availability for critical production databases. I do not work as an infrastructure(operational) dba for years. Within last years especially with 10g Release 2 Oracle’s availability options evolved a lot. I try to catch up with these on my laptop, thanks to vmware, and I really admire how everything is as is now with Oracle.

When I think about the recent years I may easily give my job description like this; database developer, part-time fire-fighter and part-time myth-buster. I always loved to develop database applications, I am working with application developers as an old Oracle dba for the last 4 years. Mostly database modeling, physical design, Oracle’s optimizer and testing for performance is what I think I am useful. Also if there is an Apex project I never miss the chance, developing Apex applications is always fun for me.

I used myth-buster on purpose, because lots of my time is dedicated to exorcism of some evil believes which mostly depend on rumors or some n years aged wrong and/or undocumented test results; “never use Oracle xyz feature it is buggy” or “this optimizer you call in fact is an abuser, I never get what I need without hints” kind of very common behavior. Usually it feels like being Van Helsing; read the documentation which is available to anyone and easy to understand, not rocket science at the end, and kill the beast with a small test case which is most probably adapted from Asktom or the Oracle Users’ Co-operative FAQ kind of resource. In the beginning this is a delicious rant but after years it becomes boring, disappointing and dissatisfying. What is the rationality behind not reading and trying to understand the technology that you are using to make money, please advice me? These days everybody is after some quick and dirty solutions.

I used fire-fighter on purpose, because I am called for aid when something really goes crazy or unexpected on production after a long development and testing period. These cases are the worst ones since with my experience it is usually too late and changes needed will cause delays and angry customers. Still it is a fire and I do whatever I can, in most cases interestingly even this amount is enough. But this is Oracle’s success of course, there is always some options guide you directly to the root cause of the problem and then a workaround related. Like Apex I am also a fan of 10g Grid Control, I never miss the chance to use it when there is a performance problem especially.

During Mr.Alejandro Vargas three days seminar I strongly felt that I have to leave this mystique world of mine and return back to the world where checklists, metalink notes and documents rule. If you do not follow them you even do not get to the point where you ruin everything. How unbelievable it is that anybody can develop some piece of code, google and copy&paste are the major enabler technologies here. When it comes to Oracle database, how ironic also that it works very satisfactory with its defaults and let people think they are “developer”s or “dba”s. I believe even after 10g’s self management capabilities and Grid Control heaven, today database market need experienced people more than ever. Do you think anybody can implement RAC or Data Guard options without reading and testing carefully? And I seriously bet that if all Oracle users as a prerequisite only read the concept guide all around the world, if you consider Oracle’s dominant leadership in the market, we should easily expect a technological shift, of course my share should be less otn forums type of questions, fires and myths.

When I contacted Alejandro with the content of this MAA seminar my primary focus was to motivate our senior dba staff in using these features. We have unlimited kind of Oracle licensing, so avoiding these options we already purchased is both a threat and causing inefficiency for the company. Alejandro was the right person and luckily he was very interested with the content and as a result for the last three days was successful. But this was only the beginning, there is a big challenge waiting. Any expert who is motivated to implement these new MAA features with 10g Release 2 will be facing another very common behavior; “if it works never take a risk and change it” Of course managers need to avoid risks, but if this is the only priority you kill the creativity in your organization, how may a dba can take risk for the sake of performance of a system if you do not support him as his manager? The answer is simple, no dba would do that and you live with a release xx years old forever. So I plan the next step as to implement and document a MAA solution’s success, this will be a reference point I hope. This is my new priority and motivation for the coming months, so I really suspect that I will be ending up as being an operational Oracle dba again, but if something unexpected happens my b plan is already in pending status :)

I hope you enjoy my three days seminar notes below, much more is already published at Alejandro’s blog, he also may be publishing a post related to this seminar with our questions’ answers and his İstanbul experiences I guess.

10gR2 Automatic Storage Management(ASM)

10gR2 Real Application Clusters(RAC)

10gR2 Data Guard

10gR2 Recovery Manager(RMAN)

10gR2 Flashback

10gR2 Enterprise Manager Grid Control

October 11, 2007

Oracle High Availability Best Practices Summary

Filed under: Oracle Best Practices — H.Tonguç Yılmaz @ 12:00 am

I have written on Oracle Maximum Availability Architecture(MAA) and 11g High Availability New Features before. Below is a quick reference prepared for 10g Release 2 single instance, RAC and standby instances;

Multiplex redo log files,
Use resumable space allocationa simple example,
Create at least two control files and multiplex them,
Enable Flashback Database,
Use Spfile,
Use Automatic Undo Management,
Use Automatic Segment Space Management,
Use Locally Managed Tablespace Management,
Use Locally Managed Temporary Tablespace Management,
Enable Archivelog Mode,
Use Flash Recovery Area,
Resynchronizing the Recovery Catalog and setting time long enough for CONTROL_FILE_RECORD_KEEP_TIME parameter(default is 7 days),
Specify a database default permanent and temporary tablespace other than SYSTEM and SYSAUX,
Enable Block Checking,
Use Auto-Tuned Checkpointing,
Use Database Resource Manager,
Log checkpoints to the alert.log,

..

All above practices, and much more of them you may comment, affect the performance, availability and mean time to recover(MTTR) of your database. For more information please read Oracle® Database High Availability Best Practices 10g Release 2 (10.2) Guide and Oracle® Database Backup and Recovery Advanced User’s Guide 10g Release 2 (10.2) Guide

October 9, 2007

Oracle Best Practices Part 5

Filed under: Oracle Best Practices — H.Tonguç Yılmaz @ 12:00 am

Part 1 best practices –
1- Put your Database into ARCHIVELOG Mode
2- Comment on Tables and Columns similar to your PL/SQL or SQL Source Codes
3- Use DBMS_STATS for statistic collection
4- Using Recovery Manager(RMAN) for Backup and Recovery needs

Part 2 best practices –
5- Resumable Statements and Space Allocation
6- Maintenance of Global Partitioned Indexes
7- Default Tablespaces
8- PL/SQL Bulk Operations
9- Locking Issues
10- Oracle System Event Triggers
11- Autonomous Transactions

Part 3 best practice –
12- Learn what is already provided with PL/SQL language

Part 4 best practice –
13- Defining Application Services for Oracle Database 10g

14- Cost Based Optimizer Best Practices and Advised Strategy

A. When and how to gather statistics for the cost based optimizer?

1. When to gather statistics; some candidate situations are as follows;

– After new database creations,
– After new created and altered segments(tables, indexes, partitions, clusters etc.),
– After hardware upgrades like CPUs, I/O subsystem(gather system statistics),
– After migration from Rule based optimizer(RBO) to Cost based optimizer(CBO),
– After large amounts of data change(bulk operations, loads, purges etc.),
– After new high/low values for keys generated,
– After 10g Optimizer statistics are automatically gathered with the job GATHER_STATS_JOB. This job gathers statistics on all objects in the database which have Missing or Stale statistics,
– Also data dictionary and fixed tables need statistics after 10g.

2. How much to gather; using Sample Sizes are recommended;

– estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE optimal for uniform sizes and distributions of data but problems can occur with skewed data or if there are lot of nulls in the column,
– Smaller sample sizes cane be preferred for large objects to save time and resource consumption,
– Row sampling means full scan of data sampling rows where as block sampling reduces I/O.

3. How to keep the statistics up to date; proposed methodology summary at “A Practical Approach to Optimizer Statistics in 10g by Andrew Holdsworth Director of Real World Performance” is as follows;

– Mentioned best practices in A.1. and A.2. will work well for 90% of your SQLs. In the initial post of this series with part 3 “Use DBMS_STATS for statistic collection” I discussed the advantages of DBMS_STATS over ANALYZE for statistics collection. So prefer dbms_stats package over analyze command(we still need analyze for backward compatibility, validate structure(index_stats information also) and list chained rows options), here is an example of advised general stats gathering method;


BEGIN
   dbms_stats.gather_schema_stats(ownname          => USER,
                                  options          => 'GATHER AUTO',
                                  degree           => 4,
                                  estimate_percent => dbms_stats.auto_sample_size,
                                  cascade          => TRUE,
                                  method_opt       => 'FOR ALL COLUMNS SIZE AUTO',
                                  granularity      => 'AUTO');
   COMMIT;
END;
/

BEGIN
   dbms_stats.gather_table_stats(ownname          => USER,
                                 tabname          => 'EMPLOYEES',
                                 degree           => 4,
                                 estimate_percent => dbms_stats.auto_sample_size,
                                 cascade          => TRUE,
                                 method_opt       => 'FOR ALL COLUMNS SIZE AUTO',
                                 granularity      => 'AUTO');
   COMMIT;
END;
/

– But for the 9% of your SQLs you may need to refine the stats collection method based on data distribution; manual histogram management and setting statistics etc.
– And for the remaining 1% of your SQLs sample size is irrelevant, you have to get assistance of options like hints, outlines and sql profiles on 10g. These options must be preferred only as a last resort.
– Do not fall into the parameter change trap;
step 1) Bad plan discovered on Query A,
step 2) Optimizer parameter changed to fix Query A,
step 3) This change causes bad plan discovery on Query B,
step 4) than another change on optimizer parameter to fix Query B brings you back to step 1 :)

Use 10046 and 10053 events for detail analysis of CBO decisions, changing the value of an optimizer parameter for a query will cause new problems.

B. Ensuring representive stats to guarantee good performance

To determining the non-representive stats basic analysis are;
Step 1) Check if the rows estimate from explain plan and v$sql_plan_statistics_all are correct,
a. V$SQL_PLAN_STATISTICS_ALL columns to compare;
CARDINALITY = optimizer row estimate
LAST_OUTPUT_ROWS = actual row count
b. e-rows and a-rows columns in the explain plan taken with below steps;

-- If you want to call this function, you need access to several of the dynamic performance views -v$session, v$sql, v$sql_plan and v$sql_plan_statistics_all
-- set serveroutput off it is enabled
-- The hint /*+ gather_plan_statistics */ enables collection of rowsource execution statistics for the statement
-- Reference : http://jonathanlewis.wordpress.com/2006/11/09/dbms_xplan-in-10g/
--
set serveroutput off 
select /*+ gather_plan_statistics */ ...
select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

Step 2) Check if the rows estimate are correct in all_[tab|part]_col_statistics,
Step 3) Check if the sample size large enough to give accurate row estimates,
Step 4) Gather more accurate stats or use dbms_stats.set_…._stats to set the calculated accurate stats and start from the step 1.

C. How to capture SQL workload against schema

1. DML monitoring information is collected and can be monitored at [DBA|ALL|USER]_TAB_MODIFICATIONS;

– This information is needed by dbms_stats to identify the objects with “stale” statistics,
– This option is eabled by default after 10g, but for 9i it must be manually set

alter table table-name monitoring;

– Tracked information can be queried from [DBA|ALL|USER]_TAB_MODIFICATIONS;

select table_name, partition_name, inserts, updates, deletes from all_tab_modifications;
TABLE_NAME PARTITION_NAME INSERTS UPDATES DELETES
---------------------------------------------------------
USERS USERS14 0 3328 0
ACCOUNT ACCOUNT23 23450 738 230
EMP EMP19 2409 390 0

– 9i and 10g use 10% change as the threshold to gather stale stats, this value is not hard-coded after 11g ad can be set with dbms_stats.set_table_prefs

2. Column usage history(predicate information) is also collected and can be monitored at sys.col_usage$;

– This information is again needed by dbms_stats to identify candidate columns on which to build histograms when method_opt=>’for …size auto’
– Tracked information can be queried from sys.col_usage$ and information is persistent after shutdown

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

References Used :
Oracle® Database Performance Tuning Guide 10g Release 2 (10.2)
Part IV Optimizing SQL Statements
http://structureddata.org/presentations/

Next Page »

Create a free website or blog at WordPress.com.