I have written on external tables before;
External table’s performance for a row and undocumented Oracle functions
Unload Data with External Tables and Data Pump
This time last week several questions from our ETL group guided me to test a little more external tables on Solaris 10 and 10g Release 2. Some of the interesting questions were;
a- can we read compressed text data over external tables,
b- can we read data over symbolic linked(ln -s) files with external tables,
c- what is the performance difference of accessing an external table compared to a heap organized and partitioned table, what are the access paths and join methods.
So let me share with you some of my findings parallel to the above questions;
a- Since the extracted files were so huge and needed so much extra disk space to uncompress this was needed. Unix pipes are like magic, for example you can export to a unix pipe and read import data through the pipe, here is an example; http://www.orafaq.com/faqunix.htm#EXPORT
Within an external table creation ddl it is possible to use a unix pipe instead of a file and you can start an uncompress process to the pipe before reading the external table with dbms_scheduler for example. But external tables only work with text files as inputs directly and this is documented; http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/tables.htm#sthref2448
b- The ETL tool we used created the extracted text files under different folders and each partition’s data with the same file name. So in order to read this information over an external table even they are all uncompressed we needed to create some symbolic links. But as I linked the documentation above even the files were functional over symbolic links at operating system, Oracle created the external tables without any error but when we tried to access the data KUP-04001: error opening file error was inevitable.
Anyway, god bless sed and awk for getting this linking script output so easily!
Code Listing 208a – AWK and SED magic to create ln -s script from an Ab initio map file
Also here are resources for awk and sed on OTN;
AWK: The Linux Administrators’ Wisdom Kit By Emmett Dulaney
Using the sed Editor By Emmett Dulaney
c- The access path’s for joining two external tables or an external table with a heap organized table were HASH JOIN BUFFERED for equality conditions and SORT-MERGE JOIN for non-equality conditions, nothing unexpected here. I used SQL*Plus’s timing and autotrace features also Kyte’s runstats package during my comparisons. With autotrace statistics recursive calls and consistent gets were major differences where as with runstats outputs pga memory max, session pga memory, DB time statistics and process queue reference latch was higher with external tables and table scan rows gotten, physical read bytes statistics were very higher with heap organized tables. Still the elapsed time of heap organized tables were ~5 to 10 times less for the same operations compared to the external tables.
Code Listing 208b – Simple performance tests with External Tables
As a summary I may easily advice that external tables are not best to access several times but they are here for our Create Table as Select(CTAS) statements and compared to SQL*Loader the Select part of the CTAS is much more flexible for our transformation needs. Also IMHO tools perform well only when they are used for they were advised and designed to, so feeling unsatisfactory when experiencing an external table not reading through compressed files is really funny to me.
Testing Information : the scripts mentioned are tested on Oracle Database 10g Enterprise Edition Release 10.2.0.3.0
References Used :
Managing External Tables