Prerequisites for using Oracle’s anti-join access paths

This time the subject is how Oracle evaluates NOT EXISTS and NOT IN clauses and hints that influence anti-join query optimization.

Below query had no respond for hours before I started to work on it, two table have some 100 millions of rows in this example;


-- original query
SELECT /*+ parallel (q1,16) */ count(*)
FROM party PARTITION(party_01) q1
WHERE party_id NOT IN (SELECT /*+ hash_aj */ party_id
FROM party_new PARTITION(party_01))

I started with checking some critical database parameters like;

– maximum how many parallel servers configured to open,

– what is the PGA management configuration


-- critical database parameter values
SELECT NAME, VALUE
FROM v$parameter
WHERE NAME LIKE '%paral%max%'
UNION ALL
SELECT NAME, VALUE
FROM v$parameter
WHERE NAME LIKE '%pga%tar%'
UNION ALL
SELECT NAME, VALUE
FROM v$parameter
WHERE NAME LIKE '%work%pol%';

parallel_max_servers    100
pga_aggregate_target    314572800
workarea_size_policy    AUTO

-- how many parallel servers are being used by whom
SELECT a.qcsid, a.qcserial#, y.osuser, COUNT(*)
FROM v$px_session a, v$session y
WHERE y.sid = a.qcsid
AND y.serial# = a.qcserial#
GROUP BY a.qcsid, a.qcserial#, y.osuser

31  1034   TCYYUCEL     8
112 473    TCYYUCEL     32
123 1005   TCYYUCEL     16

So I was sure that : 100 – (8+32+16) = 44 more parallel servers can be initiated at that time.

The key prerequisite for the anti-join access paths is that the subquery of a NOT IN clause cannot be capable of returning a null value. If the subquery of a NOT IN clause returns at least one null value, then the NOT IN predicate evaluates to false. The NOT EXISTS construct, meanwhile, concerns itself only with whether a row is returned or not, and thus does not do anything differently if a null value is returned.

For a specific query we place the MERGE_AJ, HASH_AJ, or NL_AJ hint into the NOT IN subquery. MERGE_AJ uses a sort-merge anti-join, HASH_AJ uses a hash anti-join, and NL_AJ uses a nested loop anti-join.

So I add hash_aj hint and party_id IS NOT NULL condition to subquery and the query returns in 2 and a half minute;


-- suggested method 1 - via SQL*Plus
set timing on
set linesize 300
set pagesize 0
set serveroutput on size 1000000
alter session set max_dump_file_size=unlimited;
ALTER session SET timed_statistics = true;
exec DBMS_OUTPUT.ENABLE(1000000) ;
ALTER SESSION ENABLE PARALLEL DDL ;
ALTER SESSION ENABLE PARALLEL DML ;
alter session set STATISTICS_LEVEL = ALL ;
ALTER SESSION ENABLE RESUMABLE TIMEOUT 10800 ;
WHENEVER SQLERROR CONTINUE NONE

SELECT /*+ parallel (a 8) */ COUNT(*)
FROM party PARTITION(party_01) a
WHERE party_id NOT IN (SELECT /*+ hash_aj parallel(b 8) */ party_id
FROM party_new PARTITION(party_01) b
WHERE party_id IS NOT NULL) ;

Elapsed: 00:02:29.00

With this second method I tried to reconfigure the storage parameters and statistics by creating two new tables before the anti-join;


-- suggested method 2 - via SQL*Plus
set timing on
set linesize 300
set pagesize 0
set serveroutput on size 1000000
alter session set max_dump_file_size=unlimited;
ALTER session SET timed_statistics = true;
exec DBMS_OUTPUT.ENABLE(1000000) ;
ALTER SESSION ENABLE PARALLEL DDL ;
ALTER SESSION ENABLE PARALLEL DML ;
alter session set STATISTICS_LEVEL = ALL ;
ALTER SESSION ENABLE RESUMABLE TIMEOUT 10800 ;
WHENEVER SQLERROR CONTINUE NONE

create table tong1
PCTFREE 0 PCTUSED 99 INITRANS 3 MAXTRANS 255
STORAGE ( INITIAL 5M NEXT 5M MINEXTENTS 5 MAXEXTENTS
512 PCTINCREASE 0 FREELISTS 3 FREELIST GROUPS 1)
nologging parallel 4 PARTITION BY HASH (party_id) partitions 4 AS
select /*+ parallel(aa 4) */ * from party partition(party_01) aa
/

alter table tong1
add constraint NN#tong1#party_id
check (party_id IS NOT NULL) nologging parallel 4;

EXEC DBMS_STATS.gather_table_stats(USER, 'tong1', estimate_percent => 10, block_sample => FALSE, degree => 4, granularity => 'ALL', cascade => TRUE) ;
commit ;

create table tong2
PCTFREE 0 PCTUSED 99 INITRANS 3 MAXTRANS 255
STORAGE ( INITIAL 5M NEXT 5M MINEXTENTS 5 MAXEXTENTS
512 PCTINCREASE 0 FREELISTS 3 FREELIST GROUPS 1)
nologging parallel 4 PARTITION BY HASH (party_id) partitions 4 AS
select /*+ parallel(aa 4) */ party_id FROM ANT.party_new partition(party_01) aa
/

alter table tong2
add constraint NN#tong2#party_id
check (party_id IS NOT NULL) nologging parallel 4;

EXEC DBMS_STATS.gather_table_stats(USER, 'tong2', estimate_percent => 10, block_sample => FALSE, degree => 4, granularity => 'ALL', cascade => TRUE) ;
commit ;

SELECT /*+ parallel(a 4) */ COUNT(*)
FROM tong1 a
WHERE party_id NOT IN (SELECT /*+ hash_aj parallel(b 4) */ party_id
FROM tong2 b
WHERE party_id IS NOT NULL);

Elapsed: 00:01:48.00

Also another ~%38 improvement.

Since in an anti-join access the subquery of a NOT IN clause cannot be capable of returning a null value either the columns being selected must have NOT NULL constraints or there must be predicates in the WHERE clause of the subquery to ensure there are no nulls. Even if every row in the table has a non-null value in a nullable column, you must still add the extra predicate to the WHERE clause or else Oracle will refuse to use the merge and hash anti-join access paths.

For more information please look at the examples of how anti-join access paths can be used to make some queries more efficient in the article given in the references written by Mr.Roger Schrag.

Testing Information : the scripts mentioned are tested on Oracle Database 9i Enterprise Edition Release 9.2.0.7.0

Refences Used :
“NOT IN in SQL” thread on AskTom
Metalink Note:28934.1 “Use of indexes with NOT IN subquery”
“Speeding Up Queries with Semi-Joins and Anti-Joins: How Oracle Evaluates EXISTS, NOT EXISTS, IN, and NOT IN,” by Roger Schrag (December 2004)
Jonathan Lewis’s NOT IN Blog Entry

Advertisement