[DB2 NFM z/OS] Optimizer deficiency?

Walter Janißen

[DB2 NFM z/OS] Optimizer deficiency?
Hi

I wonder, why the optimizer doesn't use index only access for the following
query:

SELECT OEBZA_ART_SL, F_OE_NR, F_OE_KURZ_NAME
FROM db2.OBTB0707
WHERE A_OE_ART_SL = 'D'
AND A_OE_NR = ?
AND HIST_LNR = ?
AND F_OE_ART_SL = 'D'
AND OEBZA_ART_SL IN ('60', '63')
ORDER BY OEBZA_ART_SL
OPTIMIZE FOR 1 ROWS
FETCH FIRST 1 ROW ONLY
FOR FETCH ONLY

DB2 picks up index OBIX0707:

unique
clustering (ratio: 98%),
48.945 leaf pages, 4 levels
6.739.028 distinct entries
statstime is 2006-11-15-06.30.58.790391
key columns:
1.) A_OE_ART_SL (asc.) 4 distinct entries
2.) A_OE_NR (asc.) 683.777 distinct entries
3.) HIST_LNR (asc.) 2.049.877 distinct entries
4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
6.) F_OE_NR (asc.) 6.435.090 distinct entries
7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries

But there is another index, where index only is possible:

OBIX3707:

unique
not clustering (ratio: 98%),
105.356 leaf pages, 4 levels
6.742.778 distinct entries
statstime is 2006-11-25-00.30.23.653995
key columns:
1.) A_OE_ART_SL (asc.) 4 distinct entries
2.) A_OE_NR (asc.) 683.777 distinct entries
3.) HIST_LNR (asc.) 2.049.877 distinct entries
4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
6.) F_OE_NR (asc.) 6.435.090 distinct entries
7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
8.) F_OE_KURZ_NAME (asc.) 6.742.778 distinct entries
9.) ABLOESE_HIST_LNR (asc.) 6.742.778 distinct entries
10.) GB_DAT (asc.) 6.742.778 distinct entries
11.) GE_DAT (asc.) 6.742.778 distinct entries
12.) BEARB_ZP (asc.)

Both indexes have 4 levels with 5 matching columns for both indexes. I
think the only mentionable difference are the number of leaf pages, but
with FETCH FIRST 1 ROW ONLY only 4 pages have to be read, however using
index OBIX0707, which has 4 levels as well, means reading 5 pages (4 index
pages and one data page).

All numbers are from RUNSTATS.

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Michael N M'Cartney

Re: [DB2 NFM z/OS] Optimizer deficiency?
(in response to Walter Janißen)
Walter,
Whilst you do have OPTIMIZE FOR 1 ROWS and FETCH FIRST 1 ROW ONLY, you also have an ORDER BY. In order to Fetch the first row any qualifying rows must be sorted first. OPTIMIZE FOR 1 ROW does not tell DB2 there will definitely be only one row, just that the number of rows likely to qualify for the result set is likely to be very low.

When assessing the optimal access path DB2 can only use the statistics available in the catalog. When you say the stats below are from RUNSTATS, I am assuming you mean these are the stats in the DB2 catalog and they were updated using RUNSTATS rather than some other 3rd party utility ?

Also, it may just be the way it appears in the email but the number of leaf pages cannot possibly be 48.945, so I am assuming it is actually 48,945.

You also have not listed any stats from SYSCOLDIST or SYSCOLDISTSTATS so I don't whether this is because you don't have any or just didn't list them.

Anyway, given the info you have supplied, I would say the optimizer is choosing the path that it is because whilst there are 2 indexes, because the 3rd column in both indexes is not one of the SQL predicates, the only possible option using either of these indexes is a matching index scan or non-matching index scan. As the first 2 cols in both indexes are predicates, I am guessing the access path chosen in either case is likely to be a matching index scan matching on the 1st 2 columns only. Therefore when assessing which will be cheapest, DB2 will take account of index levels and number of leaf pages to estimate the average number of I/Os to retrieve the data. As there are more columns in OBIX3707, there are more leaf pages. The Stats you have supplied (SYSCOLDIST & DISTSTATS may tell a different story) suggest that on average for each value in A_OE_ART_SL, DB2 may have to access 1 quarter of the leaf pages. 1 quarter of 48,945 is a lot less than 1 quarter of 105,356 so DB2 chooses matching index scan of OBIX0707 as it will on average incur less I/O.

I suspect that even if you have got stats on SYSCOLDIST and SYSCOLDISTATS, the answer will still be that using OBIX0707 will, on average still result in less I/O.

If you were to remove the ORDER BY clause, you may find that DB2 does choose the other index because now FETCH FIRST 1 ROW would then mean DB2 really would only need to get 1 row as opposed to all qualifying rows and then sort them. However this may change the result because if there are multiple qualifying rows you may no longer get the 1st row in the collated sequence, it would just be the first row DB2 retrieved using the access path it happens to chose at the time, which means that if another index were introduced or DB2 chose to use tablespace scan for some reason you could very easily get a different result.



Thanks,

Mike M'Cartney
-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf Of Walter Janißen
Sent: 13 December 2006 08:19
To: [login to unmask email]
Subject: [DB2-L] [DB2 NFM z/OS] Optimizer deficiency?

Hi

I wonder, why the optimizer doesn't use index only access for the following
query:

SELECT OEBZA_ART_SL, F_OE_NR, F_OE_KURZ_NAME
FROM db2.OBTB0707
WHERE A_OE_ART_SL = 'D'
AND A_OE_NR = ?
AND HIST_LNR = ?
AND F_OE_ART_SL = 'D'
AND OEBZA_ART_SL IN ('60', '63')
ORDER BY OEBZA_ART_SL
OPTIMIZE FOR 1 ROWS
FETCH FIRST 1 ROW ONLY
FOR FETCH ONLY

DB2 picks up index OBIX0707:

unique
clustering (ratio: 98%),
48.945 leaf pages, 4 levels
6.739.028 distinct entries
statstime is 2006-11-15-06.30.58.790391
key columns:
1.) A_OE_ART_SL (asc.) 4 distinct entries
2.) A_OE_NR (asc.) 683.777 distinct entries
3.) HIST_LNR (asc.) 2.049.877 distinct entries
4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
6.) F_OE_NR (asc.) 6.435.090 distinct entries
7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries

But there is another index, where index only is possible:

OBIX3707:

unique
not clustering (ratio: 98%),
105.356 leaf pages, 4 levels
6.742.778 distinct entries
statstime is 2006-11-25-00.30.23.653995
key columns:
1.) A_OE_ART_SL (asc.) 4 distinct entries
2.) A_OE_NR (asc.) 683.777 distinct entries
3.) HIST_LNR (asc.) 2.049.877 distinct entries
4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
6.) F_OE_NR (asc.) 6.435.090 distinct entries
7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
8.) F_OE_KURZ_NAME (asc.) 6.742.778 distinct entries
9.) ABLOESE_HIST_LNR (asc.) 6.742.778 distinct entries
10.) GB_DAT (asc.) 6.742.778 distinct entries
11.) GE_DAT (asc.) 6.742.778 distinct entries
12.) BEARB_ZP (asc.)

Both indexes have 4 levels with 5 matching columns for both indexes. I
think the only mentionable difference are the number of leaf pages, but
with FETCH FIRST 1 ROW ONLY only 4 pages have to be read, however using
index OBIX0707, which has 4 levels as well, means reading 5 pages (4 index
pages and one data page).

All numbers are from RUNSTATS.

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Walter Janißen

Re: [DB2 NFM z/OS] Optimizer deficiency?
(in response to Michael N M'Cartney)
Mike

Thank you for your reply. May be I was a little bit short with my
informations. As I am from Germany, I forgot, that you work with decimal
point is comma, so you are right, 48.945 is really 48,945. Same holds for
all other numbers. The numbers behind the columns of the indexes are values
from SYSCOLDIST (TYPE C), so we collect all those numbers using the IBM
runstats.

The mentioned statement has no sort and has 5 matching columns and
furthermore, OBIX3707 has all the leading columns of index OBIX0707.

I don't know, what you mean with: "because the 3rd column in both indexes
is not one of the SQL predicates,". The 3rd column is HIST_LNR and this
column is a predicate of the query: AND HIST_LNR = ?

So: no sort is done, FETCH FIRST and OPTIMIZE can be taken into account. So
for me, taking OBIX3707 with index-only is cheaper than taking OBIX0707 and
reading an additional data page.



On Mon, 18 Dec 2006 11:37:54 -0000, M'Cartney, Michael N
<[login to unmask email]> wrote:

>Walter,
>Whilst you do have OPTIMIZE FOR 1 ROWS and FETCH FIRST 1 ROW ONLY, you
also have an ORDER BY. In order to Fetch the first row any qualifying rows
must be sorted first. OPTIMIZE FOR 1 ROW does not tell DB2 there will
definitely be only one row, just that the number of rows likely to qualify
for the result set is likely to be very low.
>
>When assessing the optimal access path DB2 can only use the statistics
available in the catalog. When you say the stats below are from RUNSTATS, I
am assuming you mean these are the stats in the DB2 catalog and they were
updated using RUNSTATS rather than some other 3rd party utility ?
>
>Also, it may just be the way it appears in the email but the number of
leaf pages cannot possibly be 48.945, so I am assuming it is actually
48,945.
>
>You also have not listed any stats from SYSCOLDIST or SYSCOLDISTSTATS so I
don't whether this is because you don't have any or just didn't list them.
>
>Anyway, given the info you have supplied, I would say the optimizer is
choosing the path that it is because whilst there are 2 indexes, because
the 3rd column in both indexes is not one of the SQL predicates, the only
possible option using either of these indexes is a matching index scan or
non-matching index scan. As the first 2 cols in both indexes are
predicates, I am guessing the access path chosen in either case is likely
to be a matching index scan matching on the 1st 2 columns only. Therefore
when assessing which will be cheapest, DB2 will take account of index
levels and number of leaf pages to estimate the average number of I/Os to
retrieve the data. As there are more columns in OBIX3707, there are more
leaf pages. The Stats you have supplied (SYSCOLDIST & DISTSTATS may tell a
different story) suggest that on average for each value in A_OE_ART_SL, DB2
may have to access 1 quarter of the leaf pages. 1 quarter of 48,945 is a
lot less than 1 quarter of 105,356 so DB2 chooses matching index scan of
OBIX0707 as it will on average incur less I/O.
>
>I suspect that even if you have got stats on SYSCOLDIST and SYSCOLDISTATS,
the answer will still be that using OBIX0707 will, on average still result
in less I/O.
>
>If you were to remove the ORDER BY clause, you may find that DB2 does
choose the other index because now FETCH FIRST 1 ROW would then mean DB2
really would only need to get 1 row as opposed to all qualifying rows and
then sort them. However this may change the result because if there are
multiple qualifying rows you may no longer get the 1st row in the collated
sequence, it would just be the first row DB2 retrieved using the access
path it happens to chose at the time, which means that if another index
were introduced or DB2 chose to use tablespace scan for some reason you
could very easily get a different result.
>
>
>
>Thanks,
>
>Mike M'Cartney
>-----Original Message-----
>From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
Of Walter Janißen
>Sent: 13 December 2006 08:19
>To: [login to unmask email]
>Subject: [DB2-L] [DB2 NFM z/OS] Optimizer deficiency?
>
>Hi
>
>I wonder, why the optimizer doesn't use index only access for the following
>query:
>
>SELECT OEBZA_ART_SL, F_OE_NR, F_OE_KURZ_NAME
>FROM db2.OBTB0707
>WHERE A_OE_ART_SL = 'D'
>AND A_OE_NR = ?
>AND HIST_LNR = ?
>AND F_OE_ART_SL = 'D'
>AND OEBZA_ART_SL IN ('60', '63')
>ORDER BY OEBZA_ART_SL
>OPTIMIZE FOR 1 ROWS
>FETCH FIRST 1 ROW ONLY
>FOR FETCH ONLY
>
>DB2 picks up index OBIX0707:
>
>unique
>clustering (ratio: 98%),
>48.945 leaf pages, 4 levels
>6.739.028 distinct entries
>statstime is 2006-11-15-06.30.58.790391
>key columns:
>1.) A_OE_ART_SL (asc.) 4 distinct entries
>2.) A_OE_NR (asc.) 683.777 distinct entries
>3.) HIST_LNR (asc.) 2.049.877 distinct entries
>4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
>5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
>6.) F_OE_NR (asc.) 6.435.090 distinct entries
>7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
>
>But there is another index, where index only is possible:
>
>OBIX3707:
>
>unique
>not clustering (ratio: 98%),
>105.356 leaf pages, 4 levels
>6.742.778 distinct entries
>statstime is 2006-11-25-00.30.23.653995
>key columns:
>1.) A_OE_ART_SL (asc.) 4 distinct entries
>2.) A_OE_NR (asc.) 683.777 distinct entries
>3.) HIST_LNR (asc.) 2.049.877 distinct entries
>4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
>5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
>6.) F_OE_NR (asc.) 6.435.090 distinct entries
>7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
>8.) F_OE_KURZ_NAME (asc.) 6.742.778 distinct entries
>9.) ABLOESE_HIST_LNR (asc.) 6.742.778 distinct entries
>10.) GB_DAT (asc.) 6.742.778 distinct entries
>11.) GE_DAT (asc.) 6.742.778 distinct entries
>12.) BEARB_ZP (asc.)
>
>Both indexes have 4 levels with 5 matching columns for both indexes. I
>think the only mentionable difference are the number of leaf pages, but
>with FETCH FIRST 1 ROW ONLY only 4 pages have to be read, however using
>index OBIX0707, which has 4 levels as well, means reading 5 pages (4 index
>pages and one data page).
>
>All numbers are from RUNSTATS.
>
>---------------------------------------------------------------------------
------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is at
http://www.idugdb2-l.org. The IDUG List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences at
http://conferences.idug.org/index.cfm
>
>---------------------------------------------------------------------------
------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is at
http://www.idugdb2-l.org. The IDUG List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences at
http://conferences.idug.org/index.cfm

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Tom Moulder

Re: [DB2 NFM z/OS] Optimizer deficiency?
(in response to Walter Janißen)
Walter

Mike is right about one thing -- the order by clause. What I find curious
is that you have specified --> AND OEBZA_ART_SL IN ('60', '63') <-- as a
predicate, then order by on that column and additionally ask for only one
row. Given all that information, wouldn't it make more sense to just put
one value in the list? Without the order by clause, you still might get a
different result set from DB2 depending upon which row was accessed first,
but it would still be an expected value for this column.

Do you have column distribution statistics collected for the predicate
columns? Since the predicates are all equal operators, you might get a
different access path that way if the data was skewed. You could also use
Visual Explain to see these values (assuming that you have collected them)
which might help understand what the optimizer is doing and the reason for
the choices.

However, as long as the order by clause remains in the statement, DB2 will
be sorting the result set before selecting the one row requested. In fact,
as long as the IN clause mentioned above exists, there can never be
uniqueness in the result set which forces the sort.

Tom Moulder
TREX Associates, Inc.
9728 Delmonico Dr.
Keller, Tx. 76248-9559
+1 817 741-5549 Office
+1 817 741-5548 Fax
+1 682 558-6527 Mobile
[login to unmask email]
www.t-rex-associates.com

-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
Of Walter Jani ßen
Sent: Monday, December 18, 2006 7:28 AM
To: [login to unmask email]
Subject: Re: [DB2-L] [DB2 NFM z/OS] Optimizer deficiency?

Mike

Thank you for your reply. May be I was a little bit short with my
informations. As I am from Germany, I forgot, that you work with decimal
point is comma, so you are right, 48.945 is really 48,945. Same holds for
all other numbers. The numbers behind the columns of the indexes are values
from SYSCOLDIST (TYPE C), so we collect all those numbers using the IBM
runstats.

The mentioned statement has no sort and has 5 matching columns and
furthermore, OBIX3707 has all the leading columns of index OBIX0707.

I don't know, what you mean with: "because the 3rd column in both indexes
is not one of the SQL predicates,". The 3rd column is HIST_LNR and this
column is a predicate of the query: AND HIST_LNR = ?

So: no sort is done, FETCH FIRST and OPTIMIZE can be taken into account. So
for me, taking OBIX3707 with index-only is cheaper than taking OBIX0707 and
reading an additional data page.



On Mon, 18 Dec 2006 11:37:54 -0000, M'Cartney, Michael N
<[login to unmask email]> wrote:

>Walter,
>Whilst you do have OPTIMIZE FOR 1 ROWS and FETCH FIRST 1 ROW ONLY, you
also have an ORDER BY. In order to Fetch the first row any qualifying rows
must be sorted first. OPTIMIZE FOR 1 ROW does not tell DB2 there will
definitely be only one row, just that the number of rows likely to qualify
for the result set is likely to be very low.
>
>When assessing the optimal access path DB2 can only use the statistics
available in the catalog. When you say the stats below are from RUNSTATS, I
am assuming you mean these are the stats in the DB2 catalog and they were
updated using RUNSTATS rather than some other 3rd party utility ?
>
>Also, it may just be the way it appears in the email but the number of
leaf pages cannot possibly be 48.945, so I am assuming it is actually
48,945.
>
>You also have not listed any stats from SYSCOLDIST or SYSCOLDISTSTATS so I
don't whether this is because you don't have any or just didn't list them.
>
>Anyway, given the info you have supplied, I would say the optimizer is
choosing the path that it is because whilst there are 2 indexes, because
the 3rd column in both indexes is not one of the SQL predicates, the only
possible option using either of these indexes is a matching index scan or
non-matching index scan. As the first 2 cols in both indexes are
predicates, I am guessing the access path chosen in either case is likely
to be a matching index scan matching on the 1st 2 columns only. Therefore
when assessing which will be cheapest, DB2 will take account of index
levels and number of leaf pages to estimate the average number of I/Os to
retrieve the data. As there are more columns in OBIX3707, there are more
leaf pages. The Stats you have supplied (SYSCOLDIST & DISTSTATS may tell a
different story) suggest that on average for each value in A_OE_ART_SL, DB2
may have to access 1 quarter of the leaf pages. 1 quarter of 48,945 is a
lot less than 1 quarter of 105,356 so DB2 chooses matching index scan of
OBIX0707 as it will on average incur less I/O.
>
>I suspect that even if you have got stats on SYSCOLDIST and SYSCOLDISTATS,
the answer will still be that using OBIX0707 will, on average still result
in less I/O.
>
>If you were to remove the ORDER BY clause, you may find that DB2 does
choose the other index because now FETCH FIRST 1 ROW would then mean DB2
really would only need to get 1 row as opposed to all qualifying rows and
then sort them. However this may change the result because if there are
multiple qualifying rows you may no longer get the 1st row in the collated
sequence, it would just be the first row DB2 retrieved using the access
path it happens to chose at the time, which means that if another index
were introduced or DB2 chose to use tablespace scan for some reason you
could very easily get a different result.
>
>
>
>Thanks,
>
>Mike M'Cartney
>-----Original Message-----
>From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
Of Walter Janißen
>Sent: 13 December 2006 08:19
>To: [login to unmask email]
>Subject: [DB2-L] [DB2 NFM z/OS] Optimizer deficiency?
>
>Hi
>
>I wonder, why the optimizer doesn't use index only access for the following
>query:
>
>SELECT OEBZA_ART_SL, F_OE_NR, F_OE_KURZ_NAME
>FROM db2.OBTB0707
>WHERE A_OE_ART_SL = 'D'
>AND A_OE_NR = ?
>AND HIST_LNR = ?
>AND F_OE_ART_SL = 'D'
>AND OEBZA_ART_SL IN ('60', '63')
>ORDER BY OEBZA_ART_SL
>OPTIMIZE FOR 1 ROWS
>FETCH FIRST 1 ROW ONLY
>FOR FETCH ONLY
>
>DB2 picks up index OBIX0707:
>
>unique
>clustering (ratio: 98%),
>48.945 leaf pages, 4 levels
>6.739.028 distinct entries
>statstime is 2006-11-15-06.30.58.790391
>key columns:
>1.) A_OE_ART_SL (asc.) 4 distinct entries
>2.) A_OE_NR (asc.) 683.777 distinct entries
>3.) HIST_LNR (asc.) 2.049.877 distinct entries
>4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
>5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
>6.) F_OE_NR (asc.) 6.435.090 distinct entries
>7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
>
>But there is another index, where index only is possible:
>
>OBIX3707:
>
>unique
>not clustering (ratio: 98%),
>105.356 leaf pages, 4 levels
>6.742.778 distinct entries
>statstime is 2006-11-25-00.30.23.653995
>key columns:
>1.) A_OE_ART_SL (asc.) 4 distinct entries
>2.) A_OE_NR (asc.) 683.777 distinct entries
>3.) HIST_LNR (asc.) 2.049.877 distinct entries
>4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
>5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
>6.) F_OE_NR (asc.) 6.435.090 distinct entries
>7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
>8.) F_OE_KURZ_NAME (asc.) 6.742.778 distinct entries
>9.) ABLOESE_HIST_LNR (asc.) 6.742.778 distinct entries
>10.) GB_DAT (asc.) 6.742.778 distinct entries
>11.) GE_DAT (asc.) 6.742.778 distinct entries
>12.) BEARB_ZP (asc.)
>
>Both indexes have 4 levels with 5 matching columns for both indexes. I
>think the only mentionable difference are the number of leaf pages, but
>with FETCH FIRST 1 ROW ONLY only 4 pages have to be read, however using
>index OBIX0707, which has 4 levels as well, means reading 5 pages (4 index
>pages and one data page).
>
>All numbers are from RUNSTATS.
>
>---------------------------------------------------------------------------
------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is at
http://www.idugdb2-l.org. The IDUG List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences at
http://conferences.idug.org/index.cfm
>
>---------------------------------------------------------------------------
------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is at
http://www.idugdb2-l.org. The IDUG List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences at
http://conferences.idug.org/index.cfm

----------------------------------------------------------------------------
-----
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home
page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select
"Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org.
The IDUG List Admins can be reached at [login to unmask email] Find
out the latest on IDUG conferences at http://conferences.idug.org/index.cfm


--
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.1.409 / Virus Database: 268.15.22/590 - Release Date: 12/16/2006

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Terry Purcell

Re: [DB2 NFM z/OS] Optimizer deficiency?
(in response to Tom Moulder)
Hi Tom, Mike,

Sort avoidance is possible from this index, even with the IN list. And the
ORDER BY is required if the requirement is to guarantee that a value of 60
is to be returned if exists, otherwise 63.

Walter,

I would first check that you have PK32672 applied. If you do, then I would
suggest opening a PMR.

Regards
Terry Purcell

On Mon, 18 Dec 2006 08:57:19 -0600, Tom Moulder
<[login to unmask email]> wrote:

>Walter
>
>Mike is right about one thing -- the order by clause. What I find curious
>is that you have specified --> AND OEBZA_ART_SL IN ('60', '63') <-- as a
>predicate, then order by on that column and additionally ask for only one
>row. Given all that information, wouldn't it make more sense to just put
>one value in the list? Without the order by clause, you still might get a
>different result set from DB2 depending upon which row was accessed first,
>but it would still be an expected value for this column.
>
>Do you have column distribution statistics collected for the predicate
>columns? Since the predicates are all equal operators, you might get a
>different access path that way if the data was skewed. You could also use
>Visual Explain to see these values (assuming that you have collected them)
>which might help understand what the optimizer is doing and the reason for
>the choices.
>
>However, as long as the order by clause remains in the statement, DB2 will
>be sorting the result set before selecting the one row requested. In
fact,
>as long as the IN clause mentioned above exists, there can never be
>uniqueness in the result set which forces the sort.
>
>Tom Moulder
>TREX Associates, Inc.
>9728 Delmonico Dr.
>Keller, Tx. 76248-9559
>+1 817 741-5549 Office
>+1 817 741-5548 Fax
>+1 682 558-6527 Mobile
>[login to unmask email]
>www.t-rex-associates.com
>
>-----Original Message-----
>From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
>Of Walter Jani ßen
>Sent: Monday, December 18, 2006 7:28 AM
>To: [login to unmask email]
>Subject: Re: [DB2-L] [DB2 NFM z/OS] Optimizer deficiency?
>
>Mike
>
>Thank you for your reply. May be I was a little bit short with my
>informations. As I am from Germany, I forgot, that you work with decimal
>point is comma, so you are right, 48.945 is really 48,945. Same holds for
>all other numbers. The numbers behind the columns of the indexes are
values
>from SYSCOLDIST (TYPE C), so we collect all those numbers using the IBM
>runstats.
>
>The mentioned statement has no sort and has 5 matching columns and
>furthermore, OBIX3707 has all the leading columns of index OBIX0707.
>
>I don't know, what you mean with: "because the 3rd column in both indexes
>is not one of the SQL predicates,". The 3rd column is HIST_LNR and this
>column is a predicate of the query: AND HIST_LNR = ?
>
>So: no sort is done, FETCH FIRST and OPTIMIZE can be taken into account.
So
>for me, taking OBIX3707 with index-only is cheaper than taking OBIX0707
and
>reading an additional data page.
>
>
>
>On Mon, 18 Dec 2006 11:37:54 -0000, M'Cartney, Michael N
><[login to unmask email]> wrote:
>
>>Walter,
>>Whilst you do have OPTIMIZE FOR 1 ROWS and FETCH FIRST 1 ROW ONLY, you
>also have an ORDER BY. In order to Fetch the first row any qualifying rows
>must be sorted first. OPTIMIZE FOR 1 ROW does not tell DB2 there will
>definitely be only one row, just that the number of rows likely to qualify
>for the result set is likely to be very low.
>>
>>When assessing the optimal access path DB2 can only use the statistics
>available in the catalog. When you say the stats below are from RUNSTATS,
I
>am assuming you mean these are the stats in the DB2 catalog and they were
>updated using RUNSTATS rather than some other 3rd party utility ?
>>
>>Also, it may just be the way it appears in the email but the number of
>leaf pages cannot possibly be 48.945, so I am assuming it is actually
>48,945.
>>
>>You also have not listed any stats from SYSCOLDIST or SYSCOLDISTSTATS so
I
>don't whether this is because you don't have any or just didn't list them.
>>
>>Anyway, given the info you have supplied, I would say the optimizer is
>choosing the path that it is because whilst there are 2 indexes, because
>the 3rd column in both indexes is not one of the SQL predicates, the only
>possible option using either of these indexes is a matching index scan or
>non-matching index scan. As the first 2 cols in both indexes are
>predicates, I am guessing the access path chosen in either case is likely
>to be a matching index scan matching on the 1st 2 columns only. Therefore
>when assessing which will be cheapest, DB2 will take account of index
>levels and number of leaf pages to estimate the average number of I/Os to
>retrieve the data. As there are more columns in OBIX3707, there are more
>leaf pages. The Stats you have supplied (SYSCOLDIST & DISTSTATS may tell a
>different story) suggest that on average for each value in A_OE_ART_SL,
DB2
>may have to access 1 quarter of the leaf pages. 1 quarter of 48,945 is a
>lot less than 1 quarter of 105,356 so DB2 chooses matching index scan of
>OBIX0707 as it will on average incur less I/O.
>>
>>I suspect that even if you have got stats on SYSCOLDIST and
SYSCOLDISTATS,
>the answer will still be that using OBIX0707 will, on average still result
>in less I/O.
>>
>>If you were to remove the ORDER BY clause, you may find that DB2 does
>choose the other index because now FETCH FIRST 1 ROW would then mean DB2
>really would only need to get 1 row as opposed to all qualifying rows and
>then sort them. However this may change the result because if there are
>multiple qualifying rows you may no longer get the 1st row in the collated
>sequence, it would just be the first row DB2 retrieved using the access
>path it happens to chose at the time, which means that if another index
>were introduced or DB2 chose to use tablespace scan for some reason you
>could very easily get a different result.
>>
>>
>>
>>Thanks,
>>
>>Mike M'Cartney
>>-----Original Message-----
>>From: DB2 Data Base Discussion List [mailto:[login to unmask email] On
Behalf
>Of Walter Janißen
>>Sent: 13 December 2006 08:19
>>To: [login to unmask email]
>>Subject: [DB2-L] [DB2 NFM z/OS] Optimizer deficiency?
>>
>>Hi
>>
>>I wonder, why the optimizer doesn't use index only access for the
following
>>query:
>>
>>SELECT OEBZA_ART_SL, F_OE_NR, F_OE_KURZ_NAME
>>FROM db2.OBTB0707
>>WHERE A_OE_ART_SL = 'D'
>>AND A_OE_NR = ?
>>AND HIST_LNR = ?
>>AND F_OE_ART_SL = 'D'
>>AND OEBZA_ART_SL IN ('60', '63')
>>ORDER BY OEBZA_ART_SL
>>OPTIMIZE FOR 1 ROWS
>>FETCH FIRST 1 ROW ONLY
>>FOR FETCH ONLY
>>
>>DB2 picks up index OBIX0707:
>>
>>unique
>>clustering (ratio: 98%),
>>48.945 leaf pages, 4 levels
>>6.739.028 distinct entries
>>statstime is 2006-11-15-06.30.58.790391
>>key columns:
>>1.) A_OE_ART_SL (asc.) 4 distinct entries
>>2.) A_OE_NR (asc.) 683.777 distinct entries
>>3.) HIST_LNR (asc.) 2.049.877 distinct entries
>>4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
>>5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
>>6.) F_OE_NR (asc.) 6.435.090 distinct entries
>>7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
>>
>>But there is another index, where index only is possible:
>>
>>OBIX3707:
>>
>>unique
>>not clustering (ratio: 98%),
>>105.356 leaf pages, 4 levels
>>6.742.778 distinct entries
>>statstime is 2006-11-25-00.30.23.653995
>>key columns:
>>1.) A_OE_ART_SL (asc.) 4 distinct entries
>>2.) A_OE_NR (asc.) 683.777 distinct entries
>>3.) HIST_LNR (asc.) 2.049.877 distinct entries
>>4.) OEBZA_ART_SL (asc.) 5.649.897 distinct entries
>>5.) F_OE_ART_SL (asc.) 5.649.902 distinct entries
>>6.) F_OE_NR (asc.) 6.435.090 distinct entries
>>7.) AUFG_SPEZI_SL (asc.) 6.742.778 distinct entries
>>8.) F_OE_KURZ_NAME (asc.) 6.742.778 distinct entries
>>9.) ABLOESE_HIST_LNR (asc.) 6.742.778 distinct entries
>>10.) GB_DAT (asc.) 6.742.778 distinct entries
>>11.) GE_DAT (asc.) 6.742.778 distinct entries
>>12.) BEARB_ZP (asc.)
>>
>>Both indexes have 4 levels with 5 matching columns for both indexes. I
>>think the only mentionable difference are the number of leaf pages, but
>>with FETCH FIRST 1 ROW ONLY only 4 pages have to be read, however using
>>index OBIX0707, which has 4 levels as well, means reading 5 pages (4
index
>>pages and one data page).
>>
>>All numbers are from RUNSTATS.

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm