Simple SQL question

Ndt

Simple SQL question
I have this SQL :

SELECT * FROM
VVA103 WHERE D_DEB_VALI > 19960101


The optimizer decides to use a Non Matching index scan , and so i have Sync
Read instead of Prefetch (D_DEB_VALI is the third column of the index of 4
colunms )


900.000 rows with the WHERE - 1 hour
1.000.000 rows without the WHERE - 10 minutes

I have correct stats on D_DEB_VALI

RUNSTATS TABLESPACE DSNDB04.VVS103 TABLE(ALL)
INDEX (ALL KEYCARD FREQVAL NUMCOLS 1 COUNT 10
FREQVAL NUMCOLS 2 COUNT 10
FREQVAL NUMCOLS 3 COUNT 10)

When i add OR 0=0 to force the SQL to use Tablespace scan , then i've 10
minutes in response time

Can you explain me , why in this case the optimizer choose a wrong access ?

I've read a filtering formula for : COL > HV : (Hightokey - HV)/(HightoKey-
LowTokey)
In my case it is : 0,076 (Hightokey=20040130 , Lowtokey = 19000000)

The stats on this table is :
CREATOR NAME CARD NPAGES PCTPAGES PCTROWCOMP
-------- ------------------ ---------- ---------- -------- ----------
ER VVA103 1078702 359688 99 0


2 indexes :

CREATOR NAME 1stKEYCARD FULLKEYCARD NLEAF NLVLS
CLUSTRATIO
-------- ------------------ ---------- ----------- ---------- ----- --------
--
ER VVI1031 1078702 1078702 4897 3
0.9929
ER VVI1032 244925 1077361 11981 3
0.5967


VVI1032 is the index chosen by the SQL


Thank you



Neil Courtney

Re: Simple SQL question
(in response to Ndt)
Nguyen,
the optimizer didn't choose the 'wrong' path, but the one that it
decided was the best, given the data that it has to work with. The filtering
value for the VVI1032 index was the best, so it does the non-matching index
scan for the required dates. However, then it needs to get all of the data
from the table (SELECT *), and because the cluster ratio of this index is so
poor (0.5967) then you get 'Death by Random I/O'.

Do you need to select all of the data? Can you limit it to only what is in
the VVI1032 index, perhaps? Then it will be very fast (<< 10 mins).

Cheers,
Neil

-----Original Message-----
From: Nguyen [mailto:[login to unmask email]
Sent: Wednesday, January 15, 2003 6:52 AM
To: [login to unmask email]
Subject: Simple SQL question


I have this SQL :

SELECT * FROM
VVA103 WHERE D_DEB_VALI > 19960101


The optimizer decides to use a Non Matching index scan , and so i have Sync
Read instead of Prefetch (D_DEB_VALI is the third column of the index of 4
colunms )


900.000 rows with the WHERE - 1 hour
1.000.000 rows without the WHERE - 10 minutes

I have correct stats on D_DEB_VALI

RUNSTATS TABLESPACE DSNDB04.VVS103 TABLE(ALL)
INDEX (ALL KEYCARD FREQVAL NUMCOLS 1 COUNT 10
FREQVAL NUMCOLS 2 COUNT 10
FREQVAL NUMCOLS 3 COUNT 10)

When i add OR 0=0 to force the SQL to use Tablespace scan , then i've 10
minutes in response time

Can you explain me , why in this case the optimizer choose a wrong access ?

I've read a filtering formula for : COL > HV : (Hightokey - HV)/(HightoKey-
LowTokey)
In my case it is : 0,076 (Hightokey=20040130 , Lowtokey = 19000000)

The stats on this table is :
CREATOR NAME CARD NPAGES PCTPAGES PCTROWCOMP
-------- ------------------ ---------- ---------- -------- ----------
ER VVA103 1078702 359688 99 0


2 indexes :

CREATOR NAME 1stKEYCARD FULLKEYCARD NLEAF NLVLS
CLUSTRATIO
-------- ------------------ ---------- ----------- ---------- ----- --------
--
ER VVI1031 1078702 1078702 4897 3
0.9929
ER VVI1032 244925 1077361 11981 3
0.5967


VVI1032 is the index chosen by the SQL


Thank you








Ndt

Re: Simple SQL question
(in response to Neil Courtney)
I've tried with OPTIMIZE FOR 800000 ROWS , The optimizer continues to use
the index ...
As you said if the cluster ratio for the index is poor , why use it ?



Neil Courtney

Re: Simple SQL question
(in response to Ndt)
Unfortunately (or perhaps fortunately) I don't know a lot about how the
optimizer works. Not many people do. Perhaps it does not use the
CLUSTERRATIO values?

I think that there has been discussions here on this in the past. Search the
archives to see if you can get some more clues.

Bonnie Baker wrote an article about the optimizer some time back. It makes
interesting reading, but will probably not give you an answer to your
particular problem.
http://www.db2mag.com/db_area/archives/1999/q1/99sp_prog.shtml

In the mean time, if you could say what fields are in both indexes it may
help. Can you add a predicate to make it use the first index? Can you change
the table so that the second index is the clustering index instead of the
first? Does it need to be clustered the way it is for the benefit of other
queries/programs, or has this become the clustering index by default (eg.
look at the database design).

Just a few random thoughts.

Cheers,
Neil
x4927
"MG - Life's too short not to"


-----Original Message-----
From: Nguyen [mailto:[login to unmask email]
Sent: Thursday, January 16, 2003 4:41 AM
To: [login to unmask email]
Subject: Re: Simple SQL question


I've tried with OPTIMIZE FOR 800000 ROWS , The optimizer continues to use
the index ...
As you said if the cluster ratio for the index is poor , why use it ?








Patrick Bossman

Re: Simple SQL question
(in response to Neil Courtney)
Hello,
It appears your original note has some answers to your question.
You show that with the where clause predicate, 900,000 rows are returned.
But when you perform the interpolation formula, your filter factor is
approx. 7% of rows. (I'm trusting your math)

You're getting 90% of the data back and optimizer estimates 7%.

It's possible your data is skewed on a single value or small set of values
in which case you could use DSTATS to collect frequencies to address that
skew.

You could analyze the distribution of the data, and if your data is skewed
over a range of values (more data towards HIGH2KEY for instance), you could
increase LOW2KEY.

You mention that COL > :HV uses interpolation, in fact COL range LITERAL
uses linear interpolation. COL > :HV uses default interpolation which is
a chart based on COLCARDF of the column. So hopefully you really do provide
the literal, otherwise you could consider combination of frequencies and
REOPT(VARS).

Regards,
Pat Bossman