In Part 1 after a brief introduction on default indexing feature B*Tree Indexes in Oracle I mentioned “1- Virtual / NoSegment Indexes”. And in Part 2 “2- Reverse Key Index and Index Range Scan Problem” and “3- How to disable an index” were the topics.
4- Oracle Text option
For a brief history, interMedia after Oracle 8i is a feature for managing rich data such as documents, video, audio, and images in Oracle. And starting with Oracle 9i, the text searching components were named as Oracle Text and are not a part of interMedia services. Oracle Text is not separately licensed product but rather a core feature of the database, and can be used with all Oracle Database editions, including the free Express Edition(XE).
Oracle Text is a powerful search technology that enables you to build text query applications and document classification applications. Oracle Text provides indexing, word and theme searching, and viewing capabilities for text and with the development APIs allow software developers to easily implement full-featured content search applications.
Here are some highlights why you may want to use Text option;
– Oracle text uses standart SQL to index, search, and analyze text and documents stored in the Oracle database, in files and on the Web.
– Oracle text can perform linguistic analysis on documents.
– Oracle text can search text using a variety of strategies including keyword searching, contextual queries, Boolean operations, pattern matching, HTML/XML section searching etc.
– Oracle Text integrates with and benefits from features like; Partitioning, RAC, Query optimization.
Text searching examples without Oracle Text option;
select id from mytext where instr( thetext, 'Oracle') > 0; select id from mytext where thetext like '%Oracle%';
Text searching example with Oracle Text option;
-- In order to use Oracle Text PL/SQL Packages, the user must be first granted the role CTXAPP. create index mytext_idx on mytext( thetext ) indextype is CTXSYS.CONTEXT select id from mytext where contains(thetext, 'Oracle') > 0; select score(1), id from mytext where contains ( thetext, 'oracle or california', 1 ) > 0 order by score(1) desc;
I will demonstrate the context type index, for other types of Oracle Text indexes visit this link please.
In below example we want to find the rows containing ‘FORALL’ word in the cust_street_address column. First way to accomplish this task is doing a full table scan access, second way to do is creating a domain index with Oracle Text and using the contains operator. We will also create another function based index to show that this index can not help for a ‘%FORALL%’ kind of search, so what performance difference does it make;
There are a total of four tables that are automatically created whenever you create an Oracle Text index. These table names will always have a prefix of DR$, then the index name, and then a suffix of either $I, $K, $N, or $R. These tables will always be created in the same Oracle schema that owns the Oracle Text index. The $I table consists of all the tokens that have been indexed, together with a binary representation of the documents they occur in, and their positions within those documents. The $K table is an index organized table which maps internal DOCID values to external ROWID values. The $R table is designed for opposite lookup for $K table_ fetching a ROWID when you know DOCID value. The $N table contains a list of deleted DOCID values, which is used (and cleaned up) by index optimization process.
After 10g for CONTEXT indexes after 10g with CTX_DDL.SYNC_INDEX support we can bundle the index maintainance all up in a single DDL;
create index TEXT_IDX on T( TEXT_COL ) indextype is ctxsys.context parameters ('sync (every 'SYSDATE+10/1440')');
which creates a DBMS_SCHEDULER job for you, for above example running every 10 minutes. Another alternative is the “sync-on-commit” which ensures your text indexes stay right up to date(at the cost of index fragmentation and of course, commit performance).
create index TEXT_IDX on T( TEXT_COL ) indextype is ctxsys.context parameters ('sync (on commit)');
Also after 10g Oracle provides another complete solution for enterprise search, Ultra Search.
Intermedia and Location-based(Spatial, Locator) features of Oracle are very interesting to study, since you already paid a lot for your database software in order not to re-invent a dumper wheel check these features. Only by knowing the feature is available within the database(you do not need to be an expert on every each of them) you will gain competitive advantage both for long term needs of your customers and being time to market.
In Part 4 topic will be my favorite index type Bitmap Indexes, we will try to understand where to use and more important where NOT to use them with examples. Until than this article will be a good introduction.
Testing Information : the scripts mentioned are tested on Oracle Database 10g Express Edition Release 10.2.0.1.0
Refences Used :
OTN article Building Full-Text Search Applications with Oracle Text by Marko Asplund
“Unscharfe Suche” in Datenbeständen
Connor McDonald, http://www.oracledba.co.uk
Expert one–on-one Oracle. Chapter 17-interMedia. Thomas Kyte
interMedia Text & Oracle Text presentation by Gözde Ayrancı