My favorite performance items in DB2 LUW v10 - #1: Jump Scan

Index Proliferation - Just Say No

Big piles of indexes don't just appear on a table right from Day 1.  Typically, the number of indexes on a table grows 'organically' over time - we start with one, or two, and eventually, especially if the table is wide & active, more indexes are added to meet the needs of new queries and applications.   This often happens in spite of there being existing indexes that almost work - indexes that overlap heavily with the new candidate indexes, but aren't quite right  because they have gaps, etc.  But DB2 10 adds an indexing mechanism called 'jump scan' which can help reduce the number of indexes you might need on a table.

What can one index do for us, anyway?

As an example, imagine you have a table and index defined something like this:

create table foo( k1 int, k2 int, c1 char(20) )
create index idx1 on foo ( k1, k2 )

This works just fine for queries like

Q1: select c1 from foo where k1 = 2 and k2 = 7

or even

Q2: select * from foo where k1 = 2

Query Q2 above, with just the leading keypart k1 specified, works well with an index scan on idx1 because all keys (k1,k2) with a particular value of k1, or even a range of k1 values, form a single range of keys in the index.  We get a nice start & stop key pair for query Q2, and a table scan is avoided.   The thing below that looks like a sideways table just represents the index, with key pairs going left to right (so sue me, I got lazy and didn't draw a tree...)

But - and this is where many a new index comes from - what about queries like this?

Q3: select * from foo where k2 = 7

Well of course, you say - this will cause a tablescan, right?  And it pretty well always would, prior to DB2 10.  Keys with a given value, say k2=7, are scattered all over the index, so idx1 isn't going to be much help to us.   And - if you want index access, you'll probably have to create another index (cue ominous music - the index proliferation has begun...)

create index idx2 on foo ( k2 )

Fewer indexes can do more work in DB2 v10, with Jump Scan 

Jump Scan lets DB2 consider using an index where it wouldn't have been able to before.  Looking at our example data, with our troublesome query Q3:

Q3: select * from foo where k2 = 7

How can DB2 use the index idx1 to find the two discontiguous keys (1,7) and (2,7) - and any others with k2=7?    If k1 has low cardinality (call it N), DB2 can almost treat the index idx1 as N separate indexes.   If there are few enough distinct values of k1, and k2 is fairly selective on its own, a search like the following can be very efficient.

for each distinct value of k1
probe index idx1 for values (k1,7)

DB2's cost-based optimizer weighs the cost of the different options it has: using jump scan, vs. a normal table scan, vs. a full index scan, etc., and chooses the lowest-cost option.  Of course, the more distinct values of the 'gap column' (k1 in our case) there are, or the less selective the remaining key parts (k2 in our case) are, then the less benefit there will be from jump scan, and the more likely that DB2 will use a regular access plan instead.  And obviously - for the optimizer to accurately do its job in considering jump scan, it needs to have good statistics on key cardinality available.   Still, I've seen many cases where the requirements come together and let this feature enable index access, whereas in a previous release either a tablescan was used instead, or an extra index had to be created.    And no one wants extra indexes!

Explain yourself!

DB2 explain will verify for us that the index was chosen.  In fact, it will even show when jump scan was used.  Here is the 'quick & dirty' dynexpln output for "select * from foo where k2 = 7"

Access Table Name = SREES.FOO  ID = 4,11
|  Index Scan:  Name = SREES.IDX1  ID = 1
|  |  Regular Index (Not Clustered)
|  |  Index Columns:
|  |  |  1: K1 (Ascending)
|  |  |  2: K2 (Ascending)
|  #Columns = 2
|  Skip Inserted Rows
|  #Key Columns = 2
|  |  Start Key: Inclusive Value
|  |  |  1: [GAP Unconstrained]                  <<< k1 is a gap column
|  |  |  2: 50
|  |  Stop Key: Inclusive Value
|  |  |  1: [GAP Unconstrained]                  <<< k1 is a gap column
|  |  |  2: 50
|  Data Prefetch: Sequential(246), Readahead
|  Index Prefetch: Sequential(22), Readahead
|  Lock Intents
|  |  Table: Intent Share
|  |  Row  : Next Key Share
|  Sargable Predicate(s)
|  |  Return Data to Application
|  |  |  #Columns = 3
Return Data Completion

or, in db2exfmt output, regarding the IXSCAN operator doing the jump scan for us

    Gap Info:            Status
    ---------            ------
    Index Column 1:      Gap               <<  k1 is a gap column
    Index Column 2:      No Gap

2 Comments
3 Likes

My favorite, as well.

April 14, 2013 03:51 AM by Jun Su Lee

I remember you mentioned about Jump Scan at your office. This is good and simple example for explaning to others. Thanks.

Good to know

April 19, 2013 04:30 PM by Kishore Malla

Good to see...we can save resources.Thanks for bringing that up.

Recent Stories
Tips for getting the best INSERT performance

Statistics Event Monitors - terrible name, terrific feature

Measuring OVERHEAD and TRANSFERRATE on DB2 LUW - the why & how