This time the subject is how Oracle evaluates NOT EXISTS and NOT IN clauses and hints that influence anti-join query optimization.
Below query had no respond for hours before I started to work on it, two table have some 100 millions of rows in this example;
-- original query SELECT /*+ parallel (q1,16) */ count(*) FROM party PARTITION(party_01) q1 WHERE party_id NOT IN (SELECT /*+ hash_aj */ party_id FROM party_new PARTITION(party_01))
I started with checking some critical database parameters like;
– maximum how many parallel servers configured to open,
– what is the PGA management configuration
-- critical database parameter values SELECT NAME, VALUE FROM v$parameter WHERE NAME LIKE '%paral%max%' UNION ALL SELECT NAME, VALUE FROM v$parameter WHERE NAME LIKE '%pga%tar%' UNION ALL SELECT NAME, VALUE FROM v$parameter WHERE NAME LIKE '%work%pol%'; parallel_max_servers 100 pga_aggregate_target 314572800 workarea_size_policy AUTO -- how many parallel servers are being used by whom SELECT a.qcsid, a.qcserial#, y.osuser, COUNT(*) FROM v$px_session a, v$session y WHERE y.sid = a.qcsid AND y.serial# = a.qcserial# GROUP BY a.qcsid, a.qcserial#, y.osuser 31 1034 TCYYUCEL 8 112 473 TCYYUCEL 32 123 1005 TCYYUCEL 16
So I was sure that : 100 – (8+32+16) = 44 more parallel servers can be initiated at that time.
The key prerequisite for the anti-join access paths is that the subquery of a NOT IN clause cannot be capable of returning a null value. If the subquery of a NOT IN clause returns at least one null value, then the NOT IN predicate evaluates to false. The NOT EXISTS construct, meanwhile, concerns itself only with whether a row is returned or not, and thus does not do anything differently if a null value is returned.
For a specific query we place the MERGE_AJ, HASH_AJ, or NL_AJ hint into the NOT IN subquery. MERGE_AJ uses a sort-merge anti-join, HASH_AJ uses a hash anti-join, and NL_AJ uses a nested loop anti-join.
So I add hash_aj hint and party_id IS NOT NULL condition to subquery and the query returns in 2 and a half minute;
-- suggested method 1 - via SQL*Plus set timing on set linesize 300 set pagesize 0 set serveroutput on size 1000000 alter session set max_dump_file_size=unlimited; ALTER session SET timed_statistics = true; exec DBMS_OUTPUT.ENABLE(1000000) ; ALTER SESSION ENABLE PARALLEL DDL ; ALTER SESSION ENABLE PARALLEL DML ; alter session set STATISTICS_LEVEL = ALL ; ALTER SESSION ENABLE RESUMABLE TIMEOUT 10800 ; WHENEVER SQLERROR CONTINUE NONE SELECT /*+ parallel (a 8) */ COUNT(*) FROM party PARTITION(party_01) a WHERE party_id NOT IN (SELECT /*+ hash_aj parallel(b 8) */ party_id FROM party_new PARTITION(party_01) b WHERE party_id IS NOT NULL) ; Elapsed: 00:02:29.00
With this second method I tried to reconfigure the storage parameters and statistics by creating two new tables before the anti-join;
-- suggested method 2 - via SQL*Plus set timing on set linesize 300 set pagesize 0 set serveroutput on size 1000000 alter session set max_dump_file_size=unlimited; ALTER session SET timed_statistics = true; exec DBMS_OUTPUT.ENABLE(1000000) ; ALTER SESSION ENABLE PARALLEL DDL ; ALTER SESSION ENABLE PARALLEL DML ; alter session set STATISTICS_LEVEL = ALL ; ALTER SESSION ENABLE RESUMABLE TIMEOUT 10800 ; WHENEVER SQLERROR CONTINUE NONE create table tong1 PCTFREE 0 PCTUSED 99 INITRANS 3 MAXTRANS 255 STORAGE ( INITIAL 5M NEXT 5M MINEXTENTS 5 MAXEXTENTS 512 PCTINCREASE 0 FREELISTS 3 FREELIST GROUPS 1) nologging parallel 4 PARTITION BY HASH (party_id) partitions 4 AS select /*+ parallel(aa 4) */ * from party partition(party_01) aa / alter table tong1 add constraint NN#tong1#party_id check (party_id IS NOT NULL) nologging parallel 4; EXEC DBMS_STATS.gather_table_stats(USER, 'tong1', estimate_percent => 10, block_sample => FALSE, degree => 4, granularity => 'ALL', cascade => TRUE) ; commit ; create table tong2 PCTFREE 0 PCTUSED 99 INITRANS 3 MAXTRANS 255 STORAGE ( INITIAL 5M NEXT 5M MINEXTENTS 5 MAXEXTENTS 512 PCTINCREASE 0 FREELISTS 3 FREELIST GROUPS 1) nologging parallel 4 PARTITION BY HASH (party_id) partitions 4 AS select /*+ parallel(aa 4) */ party_id FROM ANT.party_new partition(party_01) aa / alter table tong2 add constraint NN#tong2#party_id check (party_id IS NOT NULL) nologging parallel 4; EXEC DBMS_STATS.gather_table_stats(USER, 'tong2', estimate_percent => 10, block_sample => FALSE, degree => 4, granularity => 'ALL', cascade => TRUE) ; commit ; SELECT /*+ parallel(a 4) */ COUNT(*) FROM tong1 a WHERE party_id NOT IN (SELECT /*+ hash_aj parallel(b 4) */ party_id FROM tong2 b WHERE party_id IS NOT NULL); Elapsed: 00:01:48.00
Also another ~%38 improvement.
Since in an anti-join access the subquery of a NOT IN clause cannot be capable of returning a null value either the columns being selected must have NOT NULL constraints or there must be predicates in the WHERE clause of the subquery to ensure there are no nulls. Even if every row in the table has a non-null value in a nullable column, you must still add the extra predicate to the WHERE clause or else Oracle will refuse to use the merge and hash anti-join access paths.
For more information please look at the examples of how anti-join access paths can be used to make some queries more efficient in the article given in the references written by Mr.Roger Schrag.
Testing Information : the scripts mentioned are tested on Oracle Database 9i Enterprise Edition Release 9.2.0.7.0
Refences Used :
“NOT IN in SQL” thread on AskTom
Metalink Note:28934.1 “Use of indexes with NOT IN subquery”
“Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN,” by Roger Schrag (December 2004)
Jonathan Lewis’s NOT IN Blog Entry