10g’s ORA_HASH function to determine if two Oracle tables’ data are equal

Before 10g instead of ORA_HASH SQL function we used DBMS_UTILITY.GET_HASH_VALUE PL/SQL function for this need, of course with its performance cost. If you want to compare two table’s meta or data there are several open source options you may already be aware of;
— a SQL based row by row comparison
http://www.dbspecialists.com/scripts/compare_data.sql
— a SQL based metadata comparison
http://www.dbspecialists.com/scripts/compare_schemas.sql
— a summary of other options
http://dgielis.blogspot.com/2006/01/compare-2-oracle-schemas.html

But what if your tables are in different databases(network cost of database links) and they are in size of some 100s of GBs each? If you do not have any large object type columns and 4294967295 maximum bucket size for ORA_HASH is enough for you to avoid hash collusion then this can be one of the best(accurate and fast both) solutions(I guess, any comments?).

Code Listing 206a – 10g’s ORA_HASH function to determine if two Oracle tables’ data are equal
Code Listing 206b – A simple hash collusion example with dbms_utility

[ NEW UPDATES MADE on April 13 2008 ]

During a row by row migration(unload on source>ftp text to traget>load texts parallely on target or like in our case insert into select from over dblink plus apply changes captured by a CDC tool on production during this logical data transfer kind of strategies) a validation need becomes very important.

This validation must involve three major parts in my point of view;

1- the metadata validation

In this part you must ensure that all objects and their statuses(constraints, triggers etc.), attributes(comments on tables etc.) are transfered exactly as they were in the source. Here you may also get assistance from Schema Comparison plugins of favorite Oracle database development IDEs like PL/SQL Developer and Toad.

2- the data validation

In this part you must ensure that all rows are transfered as they were in the source. ORA_HASH is a fast and accurate tool here as mentioned above. During this kind of a validation first thing to be sure about is the NLS settings of your source and target of course;

from SQL*Plus compare the outputs of:
SHOW PARAMETER NLS_LANGUAGE
SHOW PARAMETER NLS_TERRITORY
SHOW PARAMETER NLS_LENGTH_SEMANTICS
SELECT name, value$ from SYS.PROPS$ WHERE name = ‘NLS_CHARACTERSET’;

from Unix shell compare the outputs of:
env | grep NLS

Below is a sample application to create the validation scripts from the parameters given dynamically.

A data validation framework for 10g with ORA_HASH

If you want to use ORA_HASH previous releases of 10g you can create a temporary 10g instance near the 9i database host and use dblinks.

A data validation framework for 9i with ORA_HASH

3- the performance validation

After transfer validated successfully since this is a logical(row-by-row) transfer you must ensure that any possible performance degradation is NOT in place. New Stats for both objects and system must be gathered. Here ASSM tablespace management needs more close attention since it works to reduce the concurrent insert contention and as a result clustering factor of your indexes may change unexpectedly which may harm especially the costs of your index range scans and as a result your execution plans may change after migration. So you may prefer to use append hint but not with a parallel hint for your inserts during migration.

For more information on this topic please see Cost-Based Oracle Fundamentals book by Jonathan Lewis, Chapter 5 – The Clustering Factor(we are lucky since the related chapter is available for download here :)

Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0

References Used :
http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:26043356526642
http://www.freelists.org/archives/oracle-l/03-2006/msg00812.html
Ertürk Diriksoy‘s comment on my email at group Kahin_Tartisma
http://psoug.org/reference/ora_hash.html

4 Comments

kosour says:

September 20, 2008 at 3:22 am

Hi, tonguc !

Great artcile ! I’ve used the function dbms_utility.get_hash_value in my ETL scripts for ages. Now, I will put a close look on ora_hash. Thank you for the tip.

But I ‘ve noticed, that there is a little logical error in you script. btw, I ran into this error too :)

let’s we have table with 2 columns and in one of the rows the values in the columns were
12 and 3. so accordingly to your script, the hash was computed from string 123

the next load give us this row with changed values : 1 and 23. and again, string for hashing will be 123. So, your cdc algorithm will missed this change. In order to avoid this problem I would use a dilimiter in hashing string. My vaforite one is ‘!’

Of course, 12!3 is not equal 1!23

Regards,
Sergey.