Home » Server Options » Text & interMedia » Domain (Text) Index for Table join (Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi)
Domain (Text) Index for Table join [message #352171] Mon, 06 October 2008 08:30 Go to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
As a disclaimer, this could also be considered a performance post, but I think the route of the problem is more about what can be done with domain indexes which I would call a SQL problem...

I have recently been using Oracle Domain Context index along with some other custom indexes to do some fuzzy data lookup. I have been using this mostly to do lookups to this point but am starting to process the whole thing up which means querying based on lookup data that gets saved rather than passing data in through ad-hoc queries. So I need to be able to match a table with a context index on a certain column with two other tables one which would hold the match threshold for the query and the other which will hold the values to be looked up.

Here is a set-up example:

CREATE TABLE test_score
(score_id NUMBER, score NUMBER);

INSERT INTO test_score
     VALUES (1, 1);

CREATE TABLE test_lookup
(score_id NUMBER, lk_up_val VARCHAR2(20));

CREATE TABLE test_data
AS SELECT     SUBSTR (DBMS_RANDOM.STRING ('U', 20),
                   1,
                   ROUND (DBMS_RANDOM.VALUE (1, 20))) data_value
      FROM DUAL
CONNECT BY ROWNUM <= 10000;

INSERT INTO test_lookup
   SELECT 1,
          data_value
     FROM test_data
    WHERE ROWNUM = 1;

CREATE INDEX test_cntx
ON test_data(data_value)
INDEXTYPE IS ctxsys.CONTEXT;

BEGIN
   DBMS_STATS.gather_table_stats ('<SCHEMA>', 'test_data');
   DBMS_STATS.gather_table_stats ('<SCHEMA>', 'test_lookup');
   DBMS_STATS.gather_table_stats ('<SCHEMA>', 'test_score');
END;


Now against this structure I want to execute the following query:

SELECT /*+INDEX(dt test_cntx) */
       dt.data_value, sc.score
  FROM test_score sc, test_lookup lk, test_data dt
 WHERE sc.score_id = lk.score_id
   AND contains (dt.data_value, lk.lk_up_val) >= sc.score;


With or without the hint, I cannot get the query to use the domain index, instead it will always do a full scan of the value table and do the comparison for each and every value. See the explain plan.

SELECT STATEMENT  ALL_ROWSCost: 14  Bytes: 175  Cardinality: 5  				
	6 HASH JOIN  Cost: 14  Bytes: 175  Cardinality: 5  			
		4 NESTED LOOPS  Cost: 11  Bytes: 160  Cardinality: 5  		
			1 TABLE ACCESS FULL TABLE NAGEL.TEST_LOOKUP Cost: 3  Bytes: 21  Cardinality: 1  	
			3 TABLE ACCESS BY INDEX ROWID TABLE NAGEL.TEST_DATA Cost: 11  Bytes: 55  Cardinality: 5  	
				2 DOMAIN INDEX INDEX (DOMAIN) NAGEL.TEST_CNTX Cost: 4  
		5 TABLE ACCESS FULL TABLE NAGEL.TEST_SCORE Cost: 3  Bytes: 3  Cardinality: 1


This is only slightly slower for the 100,000 lookup values in the sample dataset, but when employed against a real data set with millions of rows the difference is enormous.

The only way I can get it to use the index is by hard coding (or passing as a parameter) the score value:

SELECT dt.data_value, sc.score
  FROM test_score sc, test_lookup lk, test_data dt
 WHERE sc.score_id = lk.score_id
   AND contains (dt.data_value, lk.lk_up_val) >= 1;


Here I get a "correct" explain plan:

SELECT STATEMENT  ALL_ROWSCost: 14  Bytes: 175  Cardinality: 5  				
	6 HASH JOIN  Cost: 14  Bytes: 175  Cardinality: 5  			
		4 NESTED LOOPS  Cost: 11  Bytes: 160  Cardinality: 5  		
			1 TABLE ACCESS FULL TABLE NAGEL.TEST_LOOKUP Cost: 3  Bytes: 21  Cardinality: 1  	
			3 TABLE ACCESS BY INDEX ROWID TABLE NAGEL.TEST_DATA Cost: 11  Bytes: 55  Cardinality: 5  	
				2 DOMAIN INDEX INDEX (DOMAIN) NAGEL.TEST_CNTX Cost: 4  
		5 TABLE ACCESS FULL TABLE NAGEL.TEST_SCORE Cost: 3  Bytes: 3  Cardinality: 1


Now I could implement something in PL/SQL to grab and store match results first pulling the scores from the score table, but the mantra always is SQL first and PL/SQL only when required so the question does anyone know why Oracle will not under any circumstances use the index in the first query? The fact that it is ignoring the hint tells me there must be some kind of technical reason but I am just not getting it.

Thanks,
Andrew
Re: Domain (Text) Index for Table join [message #352232 is a reply to message #352171] Mon, 06 October 2008 13:47 Go to previous messageGo to next message
Barbara Boehmer
Messages: 9077
Registered: November 2002
Location: California, USA
Senior Member
"Contains" is designed to be used with "> 0". So, if you use that to obtain the score, then compare the obtained score to the value in your test_score table, it will use the domain index even without a hint. Please see the reproduction and possible solution below.

-- test environment:
SCOTT@orcl_11g> CREATE TABLE test_score
  2    (score_id NUMBER,
  3  	score	 NUMBER)
  4  /

Table created.

SCOTT@orcl_11g> INSERT INTO test_score VALUES (1, 1)
  2  /

1 row created.

SCOTT@orcl_11g> CREATE TABLE test_lookup
  2    (score_id  NUMBER,
  3  	lk_up_val VARCHAR2(20))
  4  /

Table created.

SCOTT@orcl_11g> CREATE	TABLE test_data AS
  2  SELECT  SUBSTR
  3  	       (DBMS_RANDOM.STRING ('U', 20),
  4  		1,
  5  		ROUND (DBMS_RANDOM.VALUE (1, 20))) data_value
  6  FROM    DUAL
  7  CONNECT BY ROWNUM <= 10000
  8  /

Table created.

SCOTT@orcl_11g> INSERT INTO test_lookup
  2  SELECT 1, data_value
  3  FROM   test_data
  4  WHERE  ROWNUM = 1
  5  /

1 row created.

SCOTT@orcl_11g> CREATE INDEX test_cntx
  2  ON test_data (data_value)
  3  INDEXTYPE IS ctxsys.CONTEXT
  4  /

Index created.

SCOTT@orcl_11g> BEGIN
  2  	DBMS_STATS.gather_table_stats (USER, 'test_data');
  3  	DBMS_STATS.gather_table_stats (USER, 'test_lookup');
  4  	DBMS_STATS.gather_table_stats (USER, 'test_score');
  5  END;
  6  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> COLUMN data_value FORMAT A30
SCOTT@orcl_11g> SET AUTOTRACE ON EXPLAIN


-- reproduction of problem:
SCOTT@orcl_11g> SELECT /*+INDEX(dt test_cntx) */
  2  	    dt.data_value, sc.score
  3    FROM test_score sc, test_lookup lk, test_data dt
  4   WHERE sc.score_id = lk.score_id
  5  	AND contains (dt.data_value, lk.lk_up_val) >= sc.score;

DATA_VALUE                          SCORE
------------------------------ ----------
MWRUF                                   1


Execution Plan
----------------------------------------------------------
Plan hash value: 4195803967

-----------------------------------------------------------------------------------
| Id  | Operation           | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |             |   500 | 13000 |    16   (7)| 00:00:01 |
|   1 |  NESTED LOOPS       |             |   500 | 13000 |    16   (7)| 00:00:01 |
|*  2 |   HASH JOIN         |             |     1 |    15 |     7  (15)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| TEST_SCORE  |     1 |     6 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| TEST_LOOKUP |     1 |     9 |     3   (0)| 00:00:01 |
|*  5 |   TABLE ACCESS FULL | TEST_DATA   |   500 |  5500 |     9   (0)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("SC"."SCORE_ID"="LK"."SCORE_ID")
   5 - filter("SC"."SCORE"<="CTXSYS"."CONTAINS"("DT"."DATA_VALUE","LK"."LK_
              UP_VAL"))

SCOTT@orcl_11g>


-- possible solultion:
SCOTT@orcl_11g> SELECT dt.data_value, sc.score
  2    FROM test_score sc, test_lookup lk, test_data dt
  3   WHERE sc.score_id = lk.score_id
  4  	AND contains (dt.data_value, lk.lk_up_val, 1) > 0
  5  	AND SCORE (1) >= sc.score;

DATA_VALUE                          SCORE
------------------------------ ----------
MWRUF                                   1


Execution Plan
----------------------------------------------------------
Plan hash value: 1200543538

---------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |     1 |    26 |    14   (0)| 00:00:01 |
|*  1 |  HASH JOIN                    |             |     1 |    26 |    14   (0)| 00:00:01 |
|   2 |   NESTED LOOPS                |             |     5 |   100 |    11   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL          | TEST_LOOKUP |     1 |     9 |     3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS BY INDEX ROWID| TEST_DATA   |     5 |    55 |    11   (0)| 00:00:01 |
|*  5 |     DOMAIN INDEX              | TEST_CNTX   |       |       |     4   (0)| 00:00:01 |
|   6 |   TABLE ACCESS FULL           | TEST_SCORE  |     1 |     6 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("SC"."SCORE_ID"="LK"."SCORE_ID")
       filter("SC"."SCORE"<="CTXSYS"."SCORE"(1))
   5 - access("CTXSYS"."CONTAINS"("DT"."DATA_VALUE","LK"."LK_UP_VAL",1)>0)

SCOTT@orcl_11g>

[Updated on: Mon, 06 October 2008 13:52]

Report message to a moderator

Re: Domain (Text) Index for Table join [message #352233 is a reply to message #352232] Mon, 06 October 2008 14:12 Go to previous messageGo to next message
annagel
Messages: 220
Registered: April 2006
Senior Member
It works, but it is wasteful because now you need to call the score operator which has overhead which is unnecessary. The domain operator is passed the value you are looking for so say >= exactly what you are looking for should always be the most efficient way of doing it.

In other words this approach is a massive improvement because even using the index to find anything with even a remotely possible match is infinitely better than not using it at all it is still less efficient than using the index to find exactly what you are looking for.
Re: Domain (Text) Index for Table join [message #352448 is a reply to message #352233] Tue, 07 October 2008 13:27 Go to previous message
annagel
Messages: 220
Registered: April 2006
Senior Member
Doing a bit more research into this topic and I think I may have an idea why it will not execute using the index when a non-bind or non-static value is passed.

Although the example I gave here used the context domain index, I am also working on a custom index method implemented using extensible indexing interface. To this point I have implemented an index and operators and just started looking at the extensible optimizer interface. And I think the problem is the way costs are calculated for a domain index vs a domain function.

The domain function costs interface is here and the index access cost is here.

The function is pretty run of the mill, just the cost of evoking a single call to the function no matter what, for the index things are a little different it is looking to get the cost of invoking the index for a specific range which is the way the interface will actually invoke the index in the end, when the bound condition is totally known at run time (ie no bind or anything) it is impossible to be able to estimate the cost of index access so it ops for the "safe" alternative of the function. Don't know that there is any way to fix this. The method mentioned by Barbara will get it to use the index, but it will also get it to work harder than it needs to retrieval of the rows and should also (assuming the cardinality function is well written) result in bad cardinality value which could lead to poor decision making in future joins on the returned data.
Previous Topic: select query optimization
Next Topic: oracle
Goto Forum:
  


Current Time: Thu Mar 28 05:51:45 CDT 2024