Interesting Optimizer Problem

Sheldon Rich

Interesting Optimizer Problem
This is an actual production problem. Consider the following table and SQLs:


Table: LOAN_TAB 3,005 rows
LOAN_ID dec(9) cardinality 2,721
CUST_ID int cardinality 2,425
COLX char(20)
Etc....



The table LOAN_TAB has two Indexes both non-unique. Different tests were
run with clustering on each index:


Index 1:
LOAN_ID asc
CUST_ID asc

Index 2:
CUST_ID asc
LOAN_ID asc

The SQL:

Select min(LOAN_ID) from LOAN_TAB
Where CUST_ID = :HV1
And LOAN_ID > :HV2 ;

The optimizer invariably chooses Index 1 for this SQL. If it would use Index 2
it would find only one or two rows matching the CUST_ID. Using Index 1 it
must go through many LOAN_ID's until it finds a match on CUST_ID. And we
are clearly getting very inefficient results with Index 1. Why doesn't the
optimizer choose Index 2??

My understanding is that "LOAN_ID >" has a filter factor of .3
whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
the preferred predicate.

Who can explain this?

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms

Nick Dordea

Re: Interesting Optimizer Problem
(in response to Sheldon Rich)
Hi Sheldon,

The issue is related to the so-called "sarg-able" predicates.

The sql query where clause hints that :
cust_id = :HV1 is sargable whereas
loan_id > :HV2 is not sarg-able .

Subsequently, the optimizer is going to use that index for which cust_id is a
proper sub_index ( i.e. if an index is defined as col-x, col-y, col-z, ....
a proper sub-index is any subset of the key that is
col-x
OR
col-x,col-y
OR
col-x,col-y,col-z
... and so on )

In our case the sub-index is cust_id ( because is sargable) and the only
index for which cust-id is a proper sub-index is index-2 ( cust-id, loan-id
), so the index can be used to search the index to select the rows having
cust-id = :HV1

Index-1( loan-id, cust-id) is not used/selected because cust-id is a proper
sub-index ( you can not search on the index-1 structure knowing cust-id
because the index was built using <loan-id><cust-id> values ....).

My 2 cents ....

nd





On Sun, 30 Dec 2007 13:50:23 +0000, Sheldon Rich <[login to unmask email]> wrote:

>This is an actual production problem. Consider the following table and SQLs:
>
>
>Table: LOAN_TAB 3,005 rows
> LOAN_ID dec(9) cardinality 2,721
> CUST_ID int cardinality 2,425
> COLX char(20)
> Etc....
>
>
>
>The table LOAN_TAB has two Indexes both non-unique. Different tests were
>run with clustering on each index:
>
>
>Index 1:
>LOAN_ID asc
>CUST_ID asc
>
>Index 2:
>CUST_ID asc
>LOAN_ID asc
>
>The SQL:
>
>Select min(LOAN_ID) from LOAN_TAB
> Where CUST_ID = :HV1
> And LOAN_ID > :HV2 ;
>
>The optimizer invariably chooses Index 1 for this SQL. If it would use Index 2
>it would find only one or two rows matching the CUST_ID. Using Index 1 it
>must go through many LOAN_ID's until it finds a match on CUST_ID. And we
>are clearly getting very inefficient results with Index 1. Why doesn't the
>optimizer choose Index 2??
>
>My understanding is that "LOAN_ID >" has a filter factor of .3
>whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
>the preferred predicate.
>
>Who can explain this?
>
>The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L
list archives, the FAQ, and delivery preferences are at
http://www.idug.org/lsidug under the Listserv tab. While at the site, you
can also access the IDUG Online Learning Center, Tech Library and Code
Place, see the latest IDUG conference information, and much more. If you
have not yet signed up for Basic Membership in IDUG, available at no cost,
click on Member Services at http://www.idug.org/lsms

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms

Renu Sharma

Re: Interesting Optimizer Problem
(in response to Nick Dordea)
Try giving bogus predicates to see if it takes right index
(in other words force db2 to choose right index.
Renu



**************************************See AOL's top rated recipes
(http://food.aol.com/top-rated-recipes?NCID=aoltop00030000000004)

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms

Raey

Re: Interesting Optimizer Problem
(in response to Renu Sharma)
Do you have a COL Group runstats on the two columns ? May be that will give the optimizer more info and then decided to chose the right index..

cheers!!!
AK

Sheldon Rich <[login to unmask email]> wrote:
This is an actual production problem. Consider the following table and SQLs:


Table: LOAN_TAB 3,005 rows
LOAN_ID dec(9) cardinality 2,721
CUST_ID int cardinality 2,425
COLX char(20)
Etc….



The table LOAN_TAB has two Indexes both non-unique. Different tests were
run with clustering on each index:


Index 1:
LOAN_ID asc
CUST_ID asc

Index 2:
CUST_ID asc
LOAN_ID asc

The SQL:

Select min(LOAN_ID) from LOAN_TAB
Where CUST_ID = :HV1
And LOAN_ID > :HV2 ;

The optimizer invariably chooses Index 1 for this SQL. If it would use Index 2
it would find only one or two rows matching the CUST_ID. Using Index 1 it
must go through many LOAN_ID's until it finds a match on CUST_ID. And we
are clearly getting very inefficient results with Index 1. Why doesn't the
optimizer choose Index 2??

My understanding is that "LOAN_ID >" has a filter factor of .3
whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
the preferred predicate.

Who can explain this?

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms



---------------------------------
Looking for last minute shopping deals? Find them fast with Yahoo! Search.

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms

Sysdba_AHE/CORP/TPG

Re: Interesting Optimizer Problem
(in response to Raey)
A possible explanation might be that the host variable HV1 could have the
wrong data type for CUST_ID.

HTH

Neil Price
TNT Express ICS, UK





Sheldon Rich <[login to unmask email]>
Sent by: DB2 Data Base Discussion List <[login to unmask email]>
30/12/2007 13:50
Please respond to DB2 Database Discussion list at IDUG

To: [login to unmask email]
cc:
Subject: [DB2-L] Interesting Optimizer Problem


This is an actual production problem. Consider the following table and
SQLs:


Table: LOAN_TAB 3,005 rows
LOAN_ID dec(9) cardinality 2,721
CUST_ID int cardinality 2,425
COLX char(20)
Etc....

The table LOAN_TAB has two Indexes both non-unique. Different tests were
run with clustering on each index:

Index 1:
LOAN_ID asc
CUST_ID asc

Index 2:
CUST_ID asc
LOAN_ID asc

The SQL:

Select min(LOAN_ID) from LOAN_TAB
Where CUST_ID = :HV1
And LOAN_ID > :HV2 ;

The optimizer invariably chooses Index 1 for this SQL. If it would use
Index 2
it would find only one or two rows matching the CUST_ID. Using Index 1 it

must go through many LOAN_ID's until it finds a match on CUST_ID. And we
are clearly getting very inefficient results with Index 1. Why doesn't
the
optimizer choose Index 2??

My understanding is that "LOAN_ID >" has a filter factor of .3
whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
the preferred predicate.

Who can explain this?

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L
list archives, the FAQ, and delivery preferences are at
http://www.idug.org/lsidug under the Listserv tab. While at the site, you
can also access the IDUG Online Learning Center, Tech Library and Code
Place, see the latest IDUG conference information, and much more. If you
have not yet signed up for Basic Membership in IDUG, available at no cost,
click on Member Services at http://www.idug.org/lsms




---------------------------------------------------------------------------------------------------------------
This message and any attachment are confidential and may be privileged or otherwise protected from disclosure.
If you are not the intended recipient, please telephone or email the sender and delete this message and any attachment from your system.
If you are not the intended recipient you must not copy this message or attachment or disclose the contents to any other person.
Please consider the environmental impact before printing this document and its attachment(s). Print black and white and double-sided where possible.
------------------------------------------------------------------------------

Tom Moulder

Re: Interesting Optimizer Problem
(in response to Sysdba_AHE/CORP/TPG)
IF this is a Version 8 subsystem, then this will change the optimizer choice. Prior to version 8 you would have to use DSTATS or some ISV statistics package to populate the distribution statistics in the catalog like Version 8 does through Runstats.



Tom Moulder



_____

From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf Of Ayalew Kassa
Sent: Sunday, December 30, 2007 2:47 PM
To: [login to unmask email]
Subject: Re: [DB2-L] Interesting Optimizer Problem



Do you have a COL Group runstats on the two columns ? May be that will give the optimizer more info and then decided to chose the right index..



cheers!!!

AK

Sheldon Rich <[login to unmask email]> wrote:

This is an actual production problem. Consider the following table and SQLs:


Table: LOAN_TAB 3,005 rows
LOAN_ID dec(9) cardinality 2,721
CUST_ID int cardinality 2,425
COLX char(20)
Etcâe¦.



The table LOAN_TAB has two Indexes both non-unique. Different tests were
run with clustering on each index:


Index 1:
LOAN_ID asc
CUST_ID asc

Index 2:
CUST_ID asc
LOAN_ID asc

The SQL:

Select min(LOAN_ID) from LOAN_TAB
Where CUST_ID = :HV1
And LOAN_ID > :HV2 ;

The optimizer invariably chooses Index 1 for this SQL. If it would use Index 2
it would find only one or two rows matching the CUST_ID. Using Index 1 it
must go through many LOAN_ID's until it finds a match on CUST_ID. And we
are clearly getting very inefficient results with Index 1. Why doesn't the
optimizer choose Index 2??

My understanding is that "LOAN_ID >" has a filter factor of .3
whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
the preferred predicate.

Who can explain this?





No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.17.12/1203 - Release Date: 12/30/2007 11:27 AM



The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms

Terry Purcell

Re: Interesting Optimizer Problem
(in response to Tom Moulder)
Sheldon,

I suggest you open a PMR on this, as this case needs further investigation.

Regards
Terry Purcell

On Sun, 30 Dec 2007 13:50:23 +0000, Sheldon Rich <[login to unmask email]>
wrote:

>This is an actual production problem. Consider the following table and SQLs:
>
>
>Table: LOAN_TAB 3,005 rows
> LOAN_ID dec(9) cardinality 2,721
> CUST_ID int cardinality 2,425
> COLX char(20)
> Etc....
>
>
>
>The table LOAN_TAB has two Indexes both non-unique. Different tests were
>run with clustering on each index:
>
>
>Index 1:
>LOAN_ID asc
>CUST_ID asc
>
>Index 2:
>CUST_ID asc
>LOAN_ID asc
>
>The SQL:
>
>Select min(LOAN_ID) from LOAN_TAB
> Where CUST_ID = :HV1
> And LOAN_ID > :HV2 ;
>
>The optimizer invariably chooses Index 1 for this SQL. If it would use Index 2
>it would find only one or two rows matching the CUST_ID. Using Index 1 it
>must go through many LOAN_ID's until it finds a match on CUST_ID. And we
>are clearly getting very inefficient results with Index 1. Why doesn't the
>optimizer choose Index 2??
>
>My understanding is that "LOAN_ID >" has a filter factor of .3
>whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
>the preferred predicate.
>
>Who can explain this?
>

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms

Meir Zohar

Re: Interesting Optimizer Problem
(in response to Terry Purcell)
Sheldon

I'm assuming that your CUST_ID is the customer's National ID number.

Is it (and :HV1) actually defined in the application as an INT and not a
Dec(9) or similar - thus possibly causing it to be non-indexable?

Regards,

Meir Zohar
CISSP, IBM Certified DBA for DB2 for z/OS

Tel: +972 3 5747860
Fax: +972 3 5747864
Mob: +972 54 5747350
email: [login to unmask email]

-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
Of Terry Purcell
Sent: Monday, December 31, 2007 6:11 PM
To: [login to unmask email]
Subject: Re: [DB2-L] Interesting Optimizer Problem

Sheldon,

I suggest you open a PMR on this, as this case needs further investigation.

Regards
Terry Purcell

On Sun, 30 Dec 2007 13:50:23 +0000, Sheldon Rich <[login to unmask email]>
wrote:

>This is an actual production problem. Consider the following table and
SQLs:
>
>
>Table: LOAN_TAB 3,005 rows
> LOAN_ID dec(9) cardinality 2,721
> CUST_ID int cardinality 2,425
> COLX char(20)
> Etc..
>
>
>
>The table LOAN_TAB has two Indexes both non-unique. Different tests were
>run with clustering on each index:
>
>
>Index 1:
>LOAN_ID asc
>CUST_ID asc
>
>Index 2:
>CUST_ID asc
>LOAN_ID asc
>
>The SQL:
>
>Select min(LOAN_ID) from LOAN_TAB
> Where CUST_ID = :HV1
> And LOAN_ID > :HV2 ;
>
>The optimizer invariably chooses Index 1 for this SQL. If it would use
Index 2
>it would find only one or two rows matching the CUST_ID. Using Index 1 it
>must go through many LOAN_ID's until it finds a match on CUST_ID. And we
>are clearly getting very inefficient results with Index 1. Why doesn't the
>optimizer choose Index 2??
>
>My understanding is that "LOAN_ID >" has a filter factor of .3
>whereas "CUST_ID=" has a filter factor of about .0004, making "CUST_ID="
>the preferred predicate.
>
>Who can explain this?
>

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list
archives, the FAQ, and delivery preferences are at
http://www.idug.org/lsidug under the Listserv tab. While at the site, you
can also access the IDUG Online Learning Center, Tech Library and Code
Place, see the latest IDUG conference information, and much more. If you
have not yet signed up for Basic Membership in IDUG, available at no cost,
click on Member Services at http://www.idug.org/lsms

The IDUG DB2-L Listserv is only part of your membership in IDUG. DB2-L list archives, the FAQ, and delivery preferences are at http://www.idug.org/lsidug under the Listserv tab. While at the site, you can also access the IDUG Online Learning Center, Tech Library and Code Place, see the latest IDUG conference information, and much more. If you have not yet signed up for Basic Membership in IDUG, available at no cost, click on Member Services at http://www.idug.org/lsms