Non-Correlated subquery and sparse index

Joseph Burns

Non-Correlated subquery and sparse index
Hello,

I have SQL with a non-correlated subquery (about 120,000 unique values are
returned by the subquery). My understanding is that the result of the
subquery would be sorted to remove duplicates and then a sparse index
would be built on it by DB2. The sparse index is used to see if the outer
table predicate is IN the list.

SELECT *
FROM TABLE_1
WHERE CLI_NM IN
(SELECT CLI_NM
FROM TABLE_2)

TABLE_1 has about 200,000 rows in it (and there are no filtering
predicates other than the "IN" subquery).

TABLE_2 also has about 200,000 rows in it, but the non-corr select should
sort to remove duplicates (which reduces it to 120,000 unique values).

This seems to perform reasonably well. For each of the 200,000 rows in
TABLE_1 it checks to see if the CLI_NM is "IN" the 120,000 values that the
subquery has. My assumption is that it performs reasonably well because
DB2 built a sparse index on the 120,000 values. So it can look through
those 120,000 values pretty fast (even if it does have to do it 200,000
times)

However, if I change the query to add a function in the subquery, then it
seems to have a performance issue.

SELECT *
FROM TABLE_1
WHERE CLI_NM IN
(SELECT SUBSTR(CLI_NM,1,20) || 'AB'
FROM TABLE_2)

I'm not sure why the function makes such a big difference, but the query
runs signficantly longer (way longer). My assumption is that DB2 did not
build the sparse index in this case. So now it has to look through the
whole list of 120,000 values each time (it does that 1 time for each of
the 200,000 rows of the outer table - TABLE_1).

It would be nice if DB2 retrieved the list of 120,000 values (applied the
SUBSTR) and then build a sparse index on it. The length of the SUBSTR
with the concat in the subquery is exactly the same length as the CLI_NM
column in TABLE_1 (the outer table).

There are no real indexes defined on either table using the CLI_NM column,
so it's not a case of one using an index and the other not. The explain
output for both queries is the same as far as I can see (see below).

If anyone can shed some light it would be appreciated.

---------+---------+---------+---------+---------+---------+-----
QBLK/PLNNO QTYPE METHOD TNAME ACC_TYP MATCHCOLS
---------+---------+---------+---------+---------+---------+-----
1 -1 SELECT(0 ) 0 TABLE_1 R 00
2 -1 NCOSUB(1 ) 0 TABLE_2 R 00
2 -2 NCOSUB(1 ) 3 00


Thanks,
Joe Burns
Highmark Inc.

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Suresh Sane

Re: Non-Correlated subquery and sparse index
(in response to Joseph Burns)
Joe,

A couple of thoughts:

(a) Have you tried CASTing the result of the subselect? I am surmising
that the CONCAT makes it a VARCHAR which mismatches the CHAR declaration for
the column.

(B) Have you tried a simple join - would expect a merge scan join which
ought to be faster.

Thx
Suresh

>From: Joe Burns <[login to unmask email]>
>Reply-To: DB2 Database Discussion list at IDUG <[login to unmask email]>
>To: [login to unmask email]
>Subject: [DB2-L] Non-Correlated subquery and sparse index
>Date: Thu, 25 Jan 2007 14:42:03 -0600
>
>Hello,
>
>I have SQL with a non-correlated subquery (about 120,000 unique values are
>returned by the subquery). My understanding is that the result of the
>subquery would be sorted to remove duplicates and then a sparse index
>would be built on it by DB2. The sparse index is used to see if the outer
>table predicate is IN the list.
>
>SELECT *
>FROM TABLE_1
>WHERE CLI_NM IN
> (SELECT CLI_NM
> FROM TABLE_2)
>
>TABLE_1 has about 200,000 rows in it (and there are no filtering
>predicates other than the "IN" subquery).
>
>TABLE_2 also has about 200,000 rows in it, but the non-corr select should
>sort to remove duplicates (which reduces it to 120,000 unique values).
>
>This seems to perform reasonably well. For each of the 200,000 rows in
>TABLE_1 it checks to see if the CLI_NM is "IN" the 120,000 values that the
>subquery has. My assumption is that it performs reasonably well because
>DB2 built a sparse index on the 120,000 values. So it can look through
>those 120,000 values pretty fast (even if it does have to do it 200,000
>times)
>
>However, if I change the query to add a function in the subquery, then it
>seems to have a performance issue.
>
>SELECT *
>FROM TABLE_1
>WHERE CLI_NM IN
> (SELECT SUBSTR(CLI_NM,1,20) || 'AB'
> FROM TABLE_2)
>
>I'm not sure why the function makes such a big difference, but the query
>runs signficantly longer (way longer). My assumption is that DB2 did not
>build the sparse index in this case. So now it has to look through the
>whole list of 120,000 values each time (it does that 1 time for each of
>the 200,000 rows of the outer table - TABLE_1).
>
>It would be nice if DB2 retrieved the list of 120,000 values (applied the
>SUBSTR) and then build a sparse index on it. The length of the SUBSTR
>with the concat in the subquery is exactly the same length as the CLI_NM
>column in TABLE_1 (the outer table).
>
>There are no real indexes defined on either table using the CLI_NM column,
>so it's not a case of one using an index and the other not. The explain
>output for both queries is the same as far as I can see (see below).
>
>If anyone can shed some light it would be appreciated.
>
>---------+---------+---------+---------+---------+---------+-----
>QBLK/PLNNO QTYPE METHOD TNAME ACC_TYP MATCHCOLS
>---------+---------+---------+---------+---------+---------+-----
>1 -1 SELECT(0 ) 0 TABLE_1 R 00
>2 -1 NCOSUB(1 ) 0 TABLE_2 R 00
>2 -2 NCOSUB(1 ) 3 00
>
>
>Thanks,
>Joe Burns
>Highmark Inc.
>
>---------------------------------------------------------------------------------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home
>page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select
>"Join or Leave the list". The IDUG DB2-L FAQ is at
>http://www.idugdb2-l.org. The IDUG List Admins can be reached at
>[login to unmask email] Find out the latest on IDUG conferences at
>http://conferences.idug.org/index.cfm

_________________________________________________________________
Search for grocery stores. Find gratitude. Turn a simple search into
something more.
http://click4thecause.live.com/search/charity/default.aspx?source=hmemtagline_gratitude&FORM=WLMTAG

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Joseph Burns

Re: Non-Correlated subquery and sparse index
(in response to Suresh Sane)
Hi Suresh,

Well, it looks like the CAST function is a winner. I put it in and things
started running much better. So the following SQL performs fairly well.

SELECT *
FROM TABLE_1
WHERE CLI_NM IN (
SELECT CAST((SUBSTR(CLI_NM,1,20) || 'AB') AS CHAR(22))
FROM TABLE_2)

It only seems to work if I cast to the exact length as the CLI_NM from
TABLE_1, but that is perfectly fine for my uses.

And when the query runs, it performs fairly well, so I suspect the sparse
index is being build and used by DB2. That's great. Thanks for the
suggestion. I'll give credit where it's due, and you deserve some (even
if you do root for a very poor football team).

But seriously, thanks again for the suggestion.

Regards,
Joe.


On Thu, 25 Jan 2007 17:30:36 -0600, Suresh Sane <[login to unmask email]>
wrote:

>Joe,
>
>A couple of thoughts:
>
>(a) Have you tried CASTing the result of the subselect? I am surmising
>that the CONCAT makes it a VARCHAR which mismatches the CHAR declaration
for
>the column.
>
>(B) Have you tried a simple join - would expect a merge scan join which
>ought to be faster.
>
>Thx
>Suresh
>
>>From: Joe Burns <[login to unmask email]>
>>Reply-To: DB2 Database Discussion list at IDUG <[login to unmask email]>
>>To: [login to unmask email]
>>Subject: [DB2-L] Non-Correlated subquery and sparse index
>>Date: Thu, 25 Jan 2007 14:42:03 -0600
>>
>>Hello,
>>
>>I have SQL with a non-correlated subquery (about 120,000 unique values
are
>>returned by the subquery). My understanding is that the result of the
>>subquery would be sorted to remove duplicates and then a sparse index
>>would be built on it by DB2. The sparse index is used to see if the
outer
>>table predicate is IN the list.
>>
>>SELECT *
>>FROM TABLE_1
>>WHERE CLI_NM IN
>> (SELECT CLI_NM
>> FROM TABLE_2)
>>
>>TABLE_1 has about 200,000 rows in it (and there are no filtering
>>predicates other than the "IN" subquery).
>>
>>TABLE_2 also has about 200,000 rows in it, but the non-corr select should
>>sort to remove duplicates (which reduces it to 120,000 unique values).
>>
>>This seems to perform reasonably well. For each of the 200,000 rows in
>>TABLE_1 it checks to see if the CLI_NM is "IN" the 120,000 values that
the
>>subquery has. My assumption is that it performs reasonably well because
>>DB2 built a sparse index on the 120,000 values. So it can look through
>>those 120,000 values pretty fast (even if it does have to do it 200,000
>>times)
>>
>>However, if I change the query to add a function in the subquery, then it
>>seems to have a performance issue.
>>
>>SELECT *
>>FROM TABLE_1
>>WHERE CLI_NM IN
>> (SELECT SUBSTR(CLI_NM,1,20) || 'AB'
>> FROM TABLE_2)
>>
>>I'm not sure why the function makes such a big difference, but the query
>>runs signficantly longer (way longer). My assumption is that DB2 did not
>>build the sparse index in this case. So now it has to look through the
>>whole list of 120,000 values each time (it does that 1 time for each of
>>the 200,000 rows of the outer table - TABLE_1).
>>
>>It would be nice if DB2 retrieved the list of 120,000 values (applied the
>>SUBSTR) and then build a sparse index on it. The length of the SUBSTR
>>with the concat in the subquery is exactly the same length as the CLI_NM
>>column in TABLE_1 (the outer table).
>>
>>There are no real indexes defined on either table using the CLI_NM
column,
>>so it's not a case of one using an index and the other not. The explain
>>output for both queries is the same as far as I can see (see below).
>>
>>If anyone can shed some light it would be appreciated.
>>
>>---------+---------+---------+---------+---------+---------+-----
>>QBLK/PLNNO QTYPE METHOD TNAME ACC_TYP MATCHCOLS
>>---------+---------+---------+---------+---------+---------+-----
>>1 -1 SELECT(0 ) 0 TABLE_1 R 00
>>2 -1 NCOSUB(1 ) 0 TABLE_2 R 00
>>2 -2 NCOSUB(1 ) 3 00
>>
>>
>>Thanks,
>>Joe Burns
>>Highmark Inc.
>>
>>-------------------------------------------------------------------------
--------
>>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home
>>page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select
>>"Join or Leave the list". The IDUG DB2-L FAQ is at
>>http://www.idugdb2-l.org. The IDUG List Admins can be reached at
>>[login to unmask email] Find out the latest on IDUG conferences
at
>>http://conferences.idug.org/index.cfm
>
>_________________________________________________________________
>Search for grocery stores. Find gratitude. Turn a simple search into
>something more.
>http://click4thecause.live.com/search/charity/default.aspx?
source=hmemtagline_gratitude&FORM=WLMTAG
>
>--------------------------------------------------------------------------
-------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is at
http://www.idugdb2-l.org. The IDUG List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences at
http://conferences.idug.org/index.cfm

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm