Home » RDBMS Server » Performance Tuning » Index skip scan (11.2.0.4)
Index skip scan [message #635602] Thu, 02 April 2015 03:52 Go to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
Hi,

Just a quick one as the documentation doesnt cover this case.

I'm aware conceptually how skip scans work but I'm not clear on what happens when the index involved is many columns wide.

Recently I had a problem where a user query was doing a (disasterous) skip scan over an IOT however the filter was only on the 7th column of the index.

Now, the docs say that logical subindexes are determined by the number of distinct values of the leading edge (in this case, over 900...) however it goes on to give pseudo code of how the DB processes it shown as a series of unions (http://docs.oracle.com/cd/E11882_01/server.112/e40540/indexiot.htm#CHDHGIHA). My confusion is that the examples only ever give a two column index....

In the case I've described, an index very wide and the filter quite deep down, what does it do?

Does it explode it out for the number of distinct values of columns 1-6?
Does it only use the leading column and retrieve data based on that and filter for the actual filter (this one is my gut feel and performance is in line with this expectation).

If anyone has any links to share that'd be good.

[Updated on: Thu, 02 April 2015 03:53]

Report message to a moderator

Re: Index skip scan [message #635603 is a reply to message #635602] Thu, 02 April 2015 05:05 Go to previous messageGo to next message
jitu_jack
Messages: 2
Registered: November 2014
Location: Bangalore
Junior Member

It would be good if you can provide SQL and Index structure.
Now as you said, filter is only on the 7th col. possibly skip scan can take place.
As per documentation example, it is only consider 3 col. index to demonstrate with simple logic, but in real time this computational might be bigger and depend on data pattern.

I am assuming you had stats on the table before running the query.
Re: Index skip scan [message #635604 is a reply to message #635603] Thu, 02 April 2015 05:53 Go to previous messageGo to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
I get that, I was just wondering how it handles it conceptually. I.e. is it split by the leading columnS or only ever the first?

I'm trying to establish how it actually processes.

I suppose another way to phrase the question would be - after the second column, does it matter where the filter is applied? Would we expect, if all other things are equal, the same amount of work if we were filtering only a 4th column, vs a 14th column. As an example.
Re: Index skip scan [message #635623 is a reply to message #635604] Fri, 03 April 2015 06:03 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
Well I'll first say, I have no proof of anything. But then again that has never stopped me from talking about whatever it is anyway so here goes.

It seems to me that the number of leading columns that are "unused" does not matter, the approach is the same. Oracle must identify some number of starting points inside the index, and then do an index probe from each of these starting points. So given the following examples, they all do the same thing as I see it.

create index i1 on t1 (a,b,c,d,e);

where b = 1;  -- skip a  (access on b)

where c = 1;  -- skip a,b  (access on c)

where d = 1;  -- skip a,b,c  (access on d)

where e = 1;  -- skip a,b,c,d  (access on e)

where b = 1 and c = 1;  -- skip a  (access on b,c)

where c = 1 and e = 1;  -- skip a,b  (access on c, filter on e)

and so on...


Quote:
NDV = Number of Distinct Values


if NDV is 10 for column B or
NDV is 10 for column C or
NDV is 10 for column D or
NDV is 10 for column E or
NDV is 10 for columns (B,C)

etc. then what does it matter how the number of initial starting points is determined?

Regardless of the number of skipped columns, for any given data set, the skipped columns collectively as a group will yield some NDV, and that is the number of "logical sub-indexes" as oracle puts it or "starting points" as I put it. The process is the same. Once Oracle has the starting points for its Skip Scan, it does one lookup (range scan?) from each of the starting points in the index. As long as the math works out, there is no need for a "conceptual limit" on the number of columns that can be skipped. Once again however, I note that I have found nothing in any documentation to support this view so it is entirely based on "logic".

Based on your question, I did a lot of looking and found this as well which provides a little insight into why SS works or not.

Quote:
The second point is that for skip scan to work, we need to have very few distinct values for the leading column. In fact, we need to have at least as many repeated occurances of a value on average as we can fit index entries in an average leaf block else we'll need to visit each and every leaf block, which can be performed via a full index scan at the very least much cheaper than a skip scan.


First he says "leading column" but I think it should have been "leading column(s)" because if I read query plans right (though for a SKIP SCAN access it is totally ambiguous what is really happening), there is no restriction for Skip Scan being based on skipping only the first column of an index. So to your example, we assume it is possible to skip the first six columns and use column seven as the access predicate for a bunch of skip scan starting points.

Next, don't get too into this quote. All the guy is pointing out here, is that there is a lower bound to the NDV of the group of skipped columns, before a FAST FULL INDEX SCAN would be obviously better than an INDEX SKIP SCAN operation. But this quote does help to formulate a mental picture of data density that is involved in the concept of Skip Scan, which helps to understand things.

Also, keep in mind too that the % or Rows being fetched must still be small enough to warrant use of an index in the first place. If a TABLE SCAN is cheaper, that is what you will get.

I am not sure... am I helping at all?

Kevin

Re: Index skip scan [message #635625 is a reply to message #635623] Fri, 03 April 2015 06:53 Go to previous messageGo to next message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
A little thanks, I don't think it is possible for oracle to access the index as you describe (driving directly into the middle of a composite) because of how they are stored/accessed - I think it would have to be a full scan type operation were that the case.

My understanding was (and borrowing oracle examples here) the nice, simple one in the books:

--Assume that a composite index exists on the columns (cust_gender, cust_email)


SELECT * FROM sh.customers WHERE cust_email = 'Abbey@company.com';

--In a skip scan, the number of logical subindexes is determined by the number of distinct values in the leading column.

--Conceptually, the database processes the query as follows:

SELECT * FROM sh.customers WHERE cust_gender = 'F' 
  AND cust_email = 'Abbey@company.com'
UNION ALL
SELECT * FROM sh.customers WHERE cust_gender = 'M'
  AND cust_email = 'Abbey@company.com';


Now that's all nice, simple and lovely but when we take a "wider" index (and admittedly my understanding of the "leading edge" term) it becomes less clear.


I'm going to steal your example here for easier copy/paste.


create index i1 on t1 (a,b,c,d,e);

where b = 1;  -- skip a  (access on b)



So taking where b=1 as that's the textbook as a starting point it does this (conceptually)

select * from table
where a=$NDV1 and b=$VALUE
union all
select * from table
where a=$NDV2 and b=$VALUE
union all
...
select * from table
where a=$NDVX and b=$VALUE


Straightforward enough so far. Moving to the next query:

where c = 1;  -- skip a,b  (access on c)


As I see it, this can do this conceptually two ways (if there are more, shout out Smile):

Opt1:
select * from table
where a=$NDV1 and b=B_NDV1 and c=$VALUE
union all
select * from table
where a=$NDV1 and b=B_NDV2 and c=$VALUE
union all
where a=$NDV2 and b=B_NDV1 and c=$VALUE
union all
where a=$NDV2 and b=B_NDV2 and c=$VALUE
union all
...
select * from table
where a=$NDVX and b=B_NDVX and c=$VALUE


OR

Opt2:
select * from table
where a=$NDV1 and c=$VALUE
union all
where a=$NDV2 and c=$VALUE
union all
...
select * from table
where a=$NDVX and c=$VALUE


Now the thing is, to my mind the example directly above (opt2) is a bog standard index range scan/filter and makes very little difference as to the distinct keys of column A or the depth of the "query filter" (in this case, c=$value) in relation to the index column order as the workload would remain unchanged. As I write this that seems a poor way to process, it's a single block access of the entire table every time.

However the opt1...will do more index walks than Opt1, but fewer table block access and thus the order (and NDV) of the columns matter. It'll do less work processing select * from table where c=$value than it would from select * from table where e=$value.

Without knowing for sure, it makes it very hard to check the NDV we'll process on the "leading edge". It becomes very relevant if the NDV in your example all rose by a factor of 10 per column:

NDV is 10 for column A and
NDV is 100 for column B and
NDV is 1000 for column C and
NDV is 10000 for column D and
NDV is 100000 for column E

If when querying for d=$value and it ONLY cares about A...then a skip scan is a candidate - if it cares about a,b,c then....very much less so.


I wonder if the "deeper" into the index you go, the worse these get. I admit I'd never considered for a second it would start going 7 deep(!) so I've not done a lot of research on that level of detail - however when I looked it appears it's not a common question (I also had read that article you linked previously).


I hope this makes some more sense. I'm leaning towards the processing like opt2 but I'm not convinced. I suppose the summary question could be when querying for e=$value....is the leading edge (and the NDV thereof) a....or is it concat(a,b,c,d)?

[Updated on: Fri, 03 April 2015 06:54]

Report message to a moderator

Re: Index skip scan [message #635628 is a reply to message #635602] Fri, 03 April 2015 09:08 Go to previous messageGo to next message
John Watson
Messages: 8922
Registered: January 2010
Location: Global Village
Senior Member
I've tried a little test:
drop table ss;

create table ss(a number not null,b number not null,c number not null,d number not null,e number not null);
create index ssi on ss(a,b,c,d,e);

insert into ss select mod(rownum,2),mod(rownum,2),mod(rownum,2),mod(rownum,2),mod(rownum,2) from dual connect by level < 1000000;
commit;

set autot on
select /*+ index_ss(ss,ssi) */ count(8) from ss where b=1;
select /*+ index_ss(ss,ssi) */ count(8) from ss where e=1;

I get the same estimated cost and the same number of consistent gets for each query, which I think implies that the number of index probes is determined by the cardinality of only the leading column. But I'm not sure.

--update: this is 12.1.0.2 on Windows

[Updated on: Fri, 03 April 2015 09:10]

Report message to a moderator

Re: Index skip scan [message #635631 is a reply to message #635628] Fri, 03 April 2015 11:11 Go to previous messageGo to next message
Kevin Meade
Messages: 2103
Registered: December 1999
Location: Connecticut USA
Senior Member
That makes more sense. I was being totally stupid. It does not matter how many leading columns there are, you need all rows for the leading columns anyway so just doing it based on the first column is just as good.

Kevin
Re: Index skip scan [message #635668 is a reply to message #635631] Mon, 06 April 2015 02:41 Go to previous message
Roachcoach
Messages: 1576
Registered: May 2010
Location: UK
Senior Member
I got these results in 11.2.0.3 (box I was on at the time)

John, your example will set the same values for each row.

I altered it slightly to create ever increasing NDV as we go "deeper", except the last column where the NDV is reduced:

 insert into ss select  mod(rownum,2)
 ,round(dbms_random.value(1,10))
 ,round(dbms_random.value(1,100))
 ,round(dbms_random.value(1,1000))
 ,round(dbms_random.value(1,10)) 
 from dual connect by level < 1000000;


The gets increase markedly the "deeper" I go.

Edited for brevity

 select /*+ index_ss(ss,ssi) */ count(*) from system.ss where b=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2708435906

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |    13 |  8760   (1)| 00:00:54 |
|   1 |  SORT AGGREGATE  |      |     1 |    13 |            |          |
|*  2 |   INDEX SKIP SCAN| SSI  | 59474 |   755K|  8760   (1)| 00:00:54 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("B"=1)
       filter("B"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
        267  consistent gets
          0  physical reads
          0  redo size
        528  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed




 select /*+ index_ss(ss,ssi) */ count(*) from system.ss where c=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2708435906

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |    13 |  8760   (1)| 00:00:54 |
|   1 |  SORT AGGREGATE  |      |     1 |    13 |            |          |
|*  2 |   INDEX SKIP SCAN| SSI  |  6209 | 80717 |  8760   (1)| 00:00:54 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("C"=1)
       filter("C"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       4148  consistent gets
          0  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed




 select /*+ index_ss(ss,ssi) */ count(*) from system.ss where d=1;

Execution Plan
----------------------------------------------------------
Plan hash value: 2708435906

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |    13 |  8760   (1)| 00:00:54 |
|   1 |  SORT AGGREGATE  |      |     1 |    13 |            |          |
|*  2 |   INDEX SKIP SCAN| SSI  |   467 |  6071 |  8760   (1)| 00:00:54 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("D"=1)
       filter("D"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       6445  consistent gets
          0  physical reads
          0  redo size
        527  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed



 select /*+ index_ss(ss,ssi) */ count(*) from system.ss where e=1;


Execution Plan
----------------------------------------------------------
Plan hash value: 2708435906

-------------------------------------------------------------------------
| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------
|   0 | SELECT STATEMENT |      |     1 |    13 |  8760   (1)| 00:00:54 |
|   1 |  SORT AGGREGATE  |      |     1 |    13 |            |          |
|*  2 |   INDEX SKIP SCAN| SSI  | 58400 |   741K|  8760   (1)| 00:00:54 |
-------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("E"=1)
       filter("E"=1)

Note
-----
   - dynamic sampling used for this statement (level=2)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       8749  consistent gets
          0  physical reads
          0  redo size
        528  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          1  rows processed



So it would appear that the distinct combinations (imagined as concat(a-e)) does make a (big) difference.


edit: Modified the NDVs and results below


truncate table ss;

 insert into ss select  mod(rownum,2)
 ,round(dbms_random.value(1,10))
 ,round(dbms_random.value(1,10))
 ,round(dbms_random.value(1,10))
 ,round(dbms_random.value(1,10)) 
 from dual connect by level < 1000000;

select /*+ index_ss(ss,ssi) */ count(*) from system.ss where b=1;

        248  consistent gets


select /*+ index_ss(ss,ssi) */ count(*) from system.ss where c=1;

        285  consistent gets


 select /*+ index_ss(ss,ssi) */ count(*) from system.ss where d=1;


        458  consistent gets

 select /*+ index_ss(ss,ssi) */ count(*) from system.ss where e=1;

       2396  consistent gets

[Updated on: Mon, 06 April 2015 03:03]

Report message to a moderator

Previous Topic: performance
Next Topic: indexing nulls
Goto Forum:
  


Current Time: Thu Mar 28 17:36:20 CDT 2024