Indexing Like clauses

Myron Miller

Indexing Like clauses
According to the DB2 manual for V7, Like clauses are stage 1 and will use an
index if available. But what I've found is that it only uses an index if its
an index only request. Such as the following:

select name from tablea
where name like '%abc%';

If you do something like this, you get a tablespace scan:
select name, address, state from tablea
where name like '%abc%';

and tablea is defined as
id int,
name varchar 100
address char 30
state char 2
var1 char 20
var2 char 10
var3 int;

Index 1 unique id
index 2 name

The name value is not the first few characters of the name field but is within
the field, usually in the first thirty or so but not col1.

I have a table that looks more or less like this with about 750,000 rows that
I've been testing this type of query on and everytime I request columns that
are not in the index so that I'd get an index/data retrieval I get tablespace
scans. I've tried this with complete detailed stats on all columns.

Does anybody have any ideas why DB2 doesn't perform index scans when there is a
index/data retrievals versus just index only retrievals?

I've got a PMR open with IBM but haven't heard much from them.

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



Dave Nance

Re: Indexing Like clauses
(in response to Myron Miller)
Myron,
The caveat on index useage by a like statement is you have to provide the first character(s). In your case you are not providing the first character(s) of the column, so you get non-matching index scans, so when you are just looking for that column, you scan every row in the index to find your string. When you start asking for columns that are not in the index, DB2 looks at it as being better to just scan every row in the tablespace. It seems to me that you would be better off, actually breaking the name out of this column and putting it into its own column, then you could get the matching index access that you are wanting.

Dave Nance
First Health Services, Corp.
(804)527-6841

>>> [login to unmask email] 12/20/02 10:44:26 AM >>>
According to the DB2 manual for V7, Like clauses are stage 1 and will use an
index if available. But what I've found is that it only uses an index if its
an index only request. Such as the following:

select name from tablea
where name like '%abc%';

If you do something like this, you get a tablespace scan:
select name, address, state from tablea
where name like '%abc%';

and tablea is defined as
id int,
name varchar 100
address char 30
state char 2
var1 char 20
var2 char 10
var3 int;

Index 1 unique id
index 2 name

The name value is not the first few characters of the name field but is within
the field, usually in the first thirty or so but not col1.

I have a table that looks more or less like this with about 750,000 rows that
I've been testing this type of query on and everytime I request columns that
are not in the index so that I'd get an index/data retrieval I get tablespace
scans. I've tried this with complete detailed stats on all columns.

Does anybody have any ideas why DB2 doesn't perform index scans when there is a
index/data retrievals versus just index only retrievals?

I've got a PMR open with IBM but haven't heard much from them.

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com




This message, including any attachments, is intended solely for the use
of the named recipient(s) and may contain confidential and/or
privileged information. Any unauthorized review, use, disclosure or
distribution of this communication(s) is expressly prohibited.
If you are not the intended recipient, please contact the sender by
reply e-mail and destroy any and all copies of the original message.
Thank you.

Myron Miller

Re: Indexing Like clauses
(in response to Dave Nance)
Actually my example was much simplified. We can't break it into its own column
and actually satisfy all the requirements. Its been tried and it just won't
work without awful complex and innummerable subqueries.

Also, even when the first characters of the column match, it doesn't do the
index scan if index data apply. Only uses the index for INDEXONLY and no other
cases.

From what I can guess, it looks like for index/data queries, the DB2 optimizer
is always assuming that it will return more than 25% of the entire table in its
result set for each INDEX/DATA query and therefore a tablespace scan is more
efficient. Again, even here, I don't understand why as the table involved has
over 125 columns and almost 2k of data per row. So I'd think as long as less
than 50% of the table were returned that an INDEX SCAN would be more cost
effective.

I've tried OPTIMIZE for ONE ROW with no difference in results as well.

Myron
--- David Nance <[login to unmask email]> wrote:
> Myron,
> The caveat on index useage by a like statement is you have to provide the
> first character(s). In your case you are not providing the first character(s)
> of the column, so you get non-matching index scans, so when you are just
> looking for that column, you scan every row in the index to find your string.
> When you start asking for columns that are not in the index, DB2 looks at it
> as being better to just scan every row in the tablespace. It seems to me that
> you would be better off, actually breaking the name out of this column and
> putting it into its own column, then you could get the matching index access
> that you are wanting.
>
> Dave Nance
> First Health Services, Corp.
> (804)527-6841
>
> >>> [login to unmask email] 12/20/02 10:44:26 AM >>>
> According to the DB2 manual for V7, Like clauses are stage 1 and will use an
> index if available. But what I've found is that it only uses an index if its
> an index only request. Such as the following:
>
> select name from tablea
> where name like '%abc%';
>
> If you do something like this, you get a tablespace scan:
> select name, address, state from tablea
> where name like '%abc%';
>
> and tablea is defined as
> id int,
> name varchar 100
> address char 30
> state char 2
> var1 char 20
> var2 char 10
> var3 int;
>
> Index 1 unique id
> index 2 name
>
> The name value is not the first few characters of the name field but is
> within
> the field, usually in the first thirty or so but not col1.
>
> I have a table that looks more or less like this with about 750,000 rows that
> I've been testing this type of query on and everytime I request columns that
> are not in the index so that I'd get an index/data retrieval I get tablespace
> scans. I've tried this with complete detailed stats on all columns.
>
> Does anybody have any ideas why DB2 doesn't perform index scans when there is
> a
> index/data retrievals versus just index only retrievals?
>
> I've got a PMR open with IBM but haven't heard much from them.
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
>
>
>
>
>
> This message, including any attachments, is intended solely for the use
> of the named recipient(s) and may contain confidential and/or
> privileged information. Any unauthorized review, use, disclosure or
> distribution of this communication(s) is expressly prohibited.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy any and all copies of the original message.
> Thank you.
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



[login to unmask email]

Re: Indexing Like clauses
(in response to Myron Miller)
Hey Myron,

I think what Dave meant was that you have to specify the like clause
like... 'value%'. You can't have the percent at the front.
It kinda goes along the same lines as having to have the first column
specified of a multiple column index to be able to get a matchcol. So in
the case where it's index only it will use the index, but if it has to go
to the data anyway then why mess with the index. I guess it could possibly
make use of some index screening. BTW, what is your matchcol when it is
using the index?

Regards,
Jay


Jay Reavill
[login to unmask email]
IBM Global Services
Tampa, Florida
Tel: (813) 356-5317, Tie Line 8-697-5317

-----------------------------------------------------
Happiness is not around the corner.
Happiness is the Contour (SVT) of the road.
-----------------------------------------------------
The early bird gets the worm,
but the second mouse gets the cheese.
-----------------------------------------------------




Myron Miller
<[login to unmask email] To: [login to unmask email]
OO.COM> cc:
Sent by: DB2 Data Subject: Re: Indexing Like clauses
Base Discussion
List
<[login to unmask email]
LASSOC.COM>


12/20/02 12:15 PM
Please respond to
DB2 Data Base
Discussion List





Actually my example was much simplified. We can't break it into its own
column
and actually satisfy all the requirements. Its been tried and it just
won't
work without awful complex and innummerable subqueries.

Also, even when the first characters of the column match, it doesn't do the
index scan if index data apply. Only uses the index for INDEXONLY and no
other
cases.

From what I can guess, it looks like for index/data queries, the DB2
optimizer
is always assuming that it will return more than 25% of the entire table in
its
result set for each INDEX/DATA query and therefore a tablespace scan is
more
efficient. Again, even here, I don't understand why as the table involved
has
over 125 columns and almost 2k of data per row. So I'd think as long as
less
than 50% of the table were returned that an INDEX SCAN would be more cost
effective.

I've tried OPTIMIZE for ONE ROW with no difference in results as well.

Myron
--- David Nance <[login to unmask email]> wrote:
> Myron,
> The caveat on index useage by a like statement is you have to provide
the
> first character(s). In your case you are not providing the first
character(s)
> of the column, so you get non-matching index scans, so when you are just
> looking for that column, you scan every row in the index to find your
string.
> When you start asking for columns that are not in the index, DB2 looks at
it
> as being better to just scan every row in the tablespace. It seems to me
that
> you would be better off, actually breaking the name out of this column
and
> putting it into its own column, then you could get the matching index
access
> that you are wanting.
>
> Dave Nance
> First Health Services, Corp.
> (804)527-6841
>
> >>> [login to unmask email] 12/20/02 10:44:26 AM >>>
> According to the DB2 manual for V7, Like clauses are stage 1 and will
use an
> index if available. But what I've found is that it only uses an index if
its
> an index only request. Such as the following:
>
> select name from tablea
> where name like '%abc%';
>
> If you do something like this, you get a tablespace scan:
> select name, address, state from tablea
> where name like '%abc%';
>
> and tablea is defined as
> id int,
> name varchar 100
> address char 30
> state char 2
> var1 char 20
> var2 char 10
> var3 int;
>
> Index 1 unique id
> index 2 name
>
> The name value is not the first few characters of the name field but is
> within
> the field, usually in the first thirty or so but not col1.
>
> I have a table that looks more or less like this with about 750,000 rows
that
> I've been testing this type of query on and everytime I request columns
that
> are not in the index so that I'd get an index/data retrieval I get
tablespace
> scans. I've tried this with complete detailed stats on all columns.
>
> Does anybody have any ideas why DB2 doesn't perform index scans when
there is
> a
> index/data retrievals versus just index only retrievals?
>
> I've got a PMR open with IBM but haven't heard much from them.
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
>
>
the
> DB2-L webpage at http://listserv.ylassoc.com. The owners of the list can
be
>
>
> This message, including any attachments, is intended solely for the use
> of the named recipient(s) and may contain confidential and/or
> privileged information. Any unauthorized review, use, disclosure or
> distribution of this communication(s) is expressly prohibited.
> If you are not the intended recipient, please contact the sender by
> reply e-mail and destroy any and all copies of the original message.
> Thank you.
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



the DB2-L webpage at http://listserv.ylassoc.com. The owners of the list
can



Myron Miller

Re: Indexing Like clauses
(in response to damcon2@US.IBM.COM)
Actually, you can have the % at the front. And for an index only, you will get
an index match of 1 and indexscan.

From what Pat Bossman tells me, the problem is that there aren't enough stats
for DB2 to use the index if there is an index/data search, unless I specify the
where clause as NAME = :HV. Then there will be an index scan and data
retrieval. DB2 thinks that a tablespace scan is more efficient than a index
scan and subsequent random table I/O. Random I/O is very costly and DB2 tries
to avoid it at all cost, thus the preference for tablespace scans.

Pat suggested four alternatives:
1) using a host variable:
2) using OPTHINT to fool DB2 into using the index
3) Partitioning the tablespace into multiple parts, thus achieving IO
Parallelism
4) Using the DB2 Text Extender to accomplish this.

The last suggestion actually intrigues me the most. I'm going to try
implementing that this week.

Myron
--- Jackson Reavill <[login to unmask email]> wrote:
> Hey Myron,
>
> I think what Dave meant was that you have to specify the like clause
> like... 'value%'. You can't have the percent at the front.
> It kinda goes along the same lines as having to have the first column
> specified of a multiple column index to be able to get a matchcol. So in
> the case where it's index only it will use the index, but if it has to go
> to the data anyway then why mess with the index. I guess it could possibly
> make use of some index screening. BTW, what is your matchcol when it is
> using the index?
>
> Regards,
> Jay
>
>
> Jay Reavill
> [login to unmask email]
> IBM Global Services
> Tampa, Florida
> Tel: (813) 356-5317, Tie Line 8-697-5317
>
> -----------------------------------------------------
> Happiness is not around the corner.
> Happiness is the Contour (SVT) of the road.
> -----------------------------------------------------
> The early bird gets the worm,
> but the second mouse gets the cheese.
> -----------------------------------------------------
>
>
>
>
> Myron Miller
> <[login to unmask email] To:
> [login to unmask email]
> OO.COM> cc:
> Sent by: DB2 Data Subject: Re: Indexing Like
> clauses
> Base Discussion
> List
> <[login to unmask email]
> LASSOC.COM>
>
>
> 12/20/02 12:15 PM
> Please respond to
> DB2 Data Base
> Discussion List
>
>
>
>
>
> Actually my example was much simplified. We can't break it into its own
> column
> and actually satisfy all the requirements. Its been tried and it just
> won't
> work without awful complex and innummerable subqueries.
>
> Also, even when the first characters of the column match, it doesn't do the
> index scan if index data apply. Only uses the index for INDEXONLY and no
> other
> cases.
>
> From what I can guess, it looks like for index/data queries, the DB2
> optimizer
> is always assuming that it will return more than 25% of the entire table in
> its
> result set for each INDEX/DATA query and therefore a tablespace scan is
> more
> efficient. Again, even here, I don't understand why as the table involved
> has
> over 125 columns and almost 2k of data per row. So I'd think as long as
> less
> than 50% of the table were returned that an INDEX SCAN would be more cost
> effective.
>
> I've tried OPTIMIZE for ONE ROW with no difference in results as well.
>
> Myron
> --- David Nance <[login to unmask email]> wrote:
> > Myron,
> > The caveat on index useage by a like statement is you have to provide
> the
> > first character(s). In your case you are not providing the first
> character(s)
> > of the column, so you get non-matching index scans, so when you are just
> > looking for that column, you scan every row in the index to find your
> string.
> > When you start asking for columns that are not in the index, DB2 looks at
> it
> > as being better to just scan every row in the tablespace. It seems to me
> that
> > you would be better off, actually breaking the name out of this column
> and
> > putting it into its own column, then you could get the matching index
> access
> > that you are wanting.
> >
> > Dave Nance
> > First Health Services, Corp.
> > (804)527-6841
> >
> > >>> [login to unmask email] 12/20/02 10:44:26 AM >>>
> > According to the DB2 manual for V7, Like clauses are stage 1 and will
> use an
> > index if available. But what I've found is that it only uses an index if
> its
> > an index only request. Such as the following:
> >
> > select name from tablea
> > where name like '%abc%';
> >
> > If you do something like this, you get a tablespace scan:
> > select name, address, state from tablea
> > where name like '%abc%';
> >
> > and tablea is defined as
> > id int,
> > name varchar 100
> > address char 30
> > state char 2
> > var1 char 20
> > var2 char 10
> > var3 int;
> >
> > Index 1 unique id
> > index 2 name
> >
> > The name value is not the first few characters of the name field but is
> > within
> > the field, usually in the first thirty or so but not col1.
> >
> > I have a table that looks more or less like this with about 750,000 rows
> that
> > I've been testing this type of query on and everytime I request columns
> that
> > are not in the index so that I'd get an index/data retrieval I get
> tablespace
> > scans. I've tried this with complete detailed stats on all columns.
> >
> > Does anybody have any ideas why DB2 doesn't perform index scans when
> there is
> > a
> > index/data retrievals versus just index only retrievals?
> >
> > I've got a PMR open with IBM but haven't heard much from them.
> >
> > __________________________________________________
> > Do you Yahoo!?
> > Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> > http://mailplus.yahoo.com
> >
> >
> >
> the
> > DB2-L webpage at http://listserv.ylassoc.com. The owners of the list can
> be
> >
> >
> > This message, including any attachments, is intended solely for the use
> > of the named recipient(s) and may contain confidential and/or
> > privileged information. Any unauthorized review, use, disclosure or
> > distribution of this communication(s) is expressly prohibited.
> > If you are not the intended recipient, please contact the sender by
> > reply e-mail and destroy any and all copies of the original message.
> > Thank you.
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
>
>
> the DB2-L webpage at http://listserv.ylassoc.com. The owners of the list
> can
>
>
>
>
>


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com



[login to unmask email]

Re: Indexing Like clauses
(in response to Myron Miller)
Jackson Reavill
To: DB2 Data Base Discussion List
12/23/02 02:34 PM <[login to unmask email]>
cc:
From: Jackson [login to unmask email]
Subject: Re: Indexing Like clauses(Document link: Jackson
Reavill)













Ok, I'm confused (as usual). According to the V7 manual these are the
rules...

| Table 101. Predicate types and processing
|
|_________________________________________ _________ __________
_________|
| | Index- | Stage |
|
| Predicate Type | able? | 1? | Notes
|

|_________________________________________|_________|__________|_________|

| COL LIKE '%char' | N | Y | 1, 6
|
| COL LIKE host variable | Y | Y | 2, 6
|
| COL LIKE 'pattern' | Y | Y | 6
|


Notes to Table 101:
1. Indexable only if an ESCAPE character is specified and used in the
LIKE predicate. For example, COL LIKE '+%char' ESCAPE '+' is
indexable.

2. Indexable only if the pattern in the host variable is an indexable
constant (for example, host variable='char%').

6. Not indexable or stage 1 if a field procedure exists on that column.


Please enlighten,
Jay


Jay Reavill
[login to unmask email]
IBM Global Services
Tampa, Florida
Tel: (813) 356-5317, Tie Line 8-697-5317




Myron Miller
<[login to unmask email] To: [login to unmask email]
OO.COM> cc:
Sent by: DB2 Data Subject: Re: Indexing Like clauses
Base Discussion
List
<[login to unmask email]
LASSOC.COM>


12/23/02 10:04 AM
Please respond to
DB2 Data Base
Discussion List





Actually, you can have the % at the front. And for an index only, you will
get
an index match of 1 and indexscan.

From what Pat Bossman tells me, the problem is that there aren't enough
stats
for DB2 to use the index if there is an index/data search, unless I specify
the
where clause as NAME = :HV. Then there will be an index scan and data
retrieval. DB2 thinks that a tablespace scan is more efficient than a
index
scan and subsequent random table I/O. Random I/O is very costly and DB2
tries
to avoid it at all cost, thus the preference for tablespace scans.

Pat suggested four alternatives:
1) using a host variable:
2) using OPTHINT to fool DB2 into using the index
3) Partitioning the tablespace into multiple parts, thus achieving IO
Parallelism
4) Using the DB2 Text Extender to accomplish this.

The last suggestion actually intrigues me the most. I'm going to try
implementing that this week.

Myron