[DB2 V8 NFM z/OS] MERGE_JOIN_COLS in PLAN_TABLE

Walter Janißen

[DB2 V8 NFM z/OS] MERGE_JOIN_COLS in PLAN_TABLE
Hi

A long time has passed, since I put the last question on the list. But here we
go. Does anybody know, if the number of merge_join_columns has any impact
on query performance.

In one of our queries this number was 2. When I add a superflous join
condition, I get 4 in this column.

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Terry Purcell

Re: [DB2 V8 NFM z/OS] MERGE_JOIN_COLS in PLAN_TABLE
(in response to Walter Janißen)
Hi Walter,

Yes it can have an impact on query performance. It all depends on the
filtering of the join predicates that are included & excluded.

I assume in your example you originally have 3 join predicates, but optimizer is
choosing to apply only 2 as merge join columns. Which means the 3rd is being
applied after the join (and if this is filtering then more rows unnecessary rows
survive the join only to be filtered after the join from the 3rd join predicate).

When you add the 4th bogus predicate, optimizer chooses to use all 4 as
merge join columns.

So why wouldn't optimizer always use all available join predicates as merge
join columns in a sort merge join (also known as merge scan join)?

Both tables need to be in order, which means a sort of one or both if indexes
do not provide order (or if it is inefficient to use the index). So it is likely that
you had an index that provided order on the 2 columns for at least one of the
tables, and optimizer needed to make a decision to either
1) avoid a sort and join on only 2 columns (leaving the 3rd to be applied after).
or
2) sort and join on all 3 columns.

Note this is my best guess given the information provided. Hope that helps.

Regards
Terry Purcell

On Wed, 21 Nov 2007 10:59:19 -0500, walter
<[login to unmask email]> wrote:

>Hi
>
>A long time has passed, since I put the last question on the list. But here we
>go. Does anybody know, if the number of merge_join_columns has any
impact
>on query performance.
>
>In one of our queries this number was 2. When I add a superflous join
>condition, I get 4 in this column.

IMPORTANT NOTICE:

IDUG is pleased to announce a series of upgrades to the DB2-L discussion listserv that are being implemented to improve reliability and the overall user experience of DB2-L. These changes are coming on November 30th. Details at http://www.idug.org

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Walter Jani&#223;en

Re: [DB2 V8 NFM z/OS] MERGE_JOIN_COLS in PLAN_TABLE
(in response to Terry Purcell)
Hi Terry

Thank you for your reply. And your assumption was (surprisingly) right, if I had
exactly documented my superflous join condition. Indeed it were 2.
What I noticed, is, that DB2 does no transitive closure for join predicates. Is
that right?

This was the basic query:

SELECT *
FROM V$DB2FTB.EGTBQ100 AS VE
JOIN V$DB2FTB.EGTBQ102 AS VP
ON VE.MONAT = 12 AND VE.JAHR = 2006
AND VP.MONAT = 12 AND VP.JAHR = 2006
AND VE.V_VERW_SYS_SL IN ('AS','SV')
AND VP.V_VERW_SYS_SL IN ('AS','SV')
AND VE.V_VERW_SYS_SL = VP.V_VERW_SYS_SL
AND VE.VNR = VP.VNR
WHERE GES_SL = '0302'
AND ZW_SL <> 'EB' AND G_KZ <> '2' AND KS_KZ <> '2'
AND (V_STS_SL <> '7' OR V_END_DAT >= '31.12.2006' )
AND (KUEND_VORM_DAT IS NULL OR KUEND_VORM_DAT >= '31.12.2006')

So there are 2 join conditions, but there are two more, if transitive closure
would have taken place. Because when I add the following 2 condtions:

AND VE.JAHR = VP.JAHR AND VE.MONAT = VP.MONAT

MERGE_JOIN_COLS jumps from 2 to 4. So according to your explanation, there
is no change in performance, because the 2 predicates are evaluated before
the join, right?

I thought, that the number of join columns could have any influence of how
often DB2 has to jump from one merge stream to the other or how many rows
it can skip before the next comparison has to be made. Do you understand,
what I mean? But then, it would be, that the more join columns the worse.
What??

IMPORTANT NOTICE:

IDUG is pleased to announce a series of upgrades to the DB2-L discussion listserv that are being implemented to improve reliability and the overall user experience of DB2-L. These changes are coming on November 30th. Details at http://www.idug.org

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Terry Purcell

Re: [DB2 V8 NFM z/OS] MERGE_JOIN_COLS in PLAN_TABLE
(in response to Walter Janißen)
Hi Walter,

DB2 does do transitive closure of join predicates, but it does not convert local
predicates into join predicates, which is what you were looking for here.

What I mean is that if you have join predicates T1.C1 = T2.C1 AND T2.C1 =
T3.C1, then DB2 will also generate T1.C1 = T3.C1.

But what you are asking for is if T1.C1 = 1 AND T2.C1 = 1, then DB2 could
also generate T1.C1 = T2.C1. And that doesn't happen.

And basically as you point out, there is no huge benefit to doing this (one
exception is if there are no join predicates and NLJ is a poor choice). For SMJ
(sort merge join), the local predicates are applied before the join. So only
rows from VE with MONAT = 12 AND JAHR = 2006, and rows from VP with
MONAT = 12 AND JAHR = 2006, will survive to be joined. Having 2 additional
join predicates on these columns will only add overhead of extra predicates to
apply.

I don't see extra merge join predicates as an overhead for the merge join
phase. Sure it is a longer key to compare (which affects sort cost), but if the
keys are filtering then generally you want to apply the predicates as early as
possible. And after local predicates and merge join predicates, any other join
predicates (eg. join expressions, or non-boolean cross table predicates) are
applied after the join. And this is a greater overhead than applying them
during the merge.

Regards
Terry Purcell

On Fri, 23 Nov 2007 11:15:34 -0500, walter <[login to unmask email]>
wrote:

>Hi Terry
>
>Thank you for your reply. And your assumption was (surprisingly) right, if I
had
>exactly documented my superflous join condition. Indeed it were 2.
>What I noticed, is, that DB2 does no transitive closure for join predicates. Is
>that right?
>
>This was the basic query:
>
>SELECT *
> FROM V$DB2FTB.EGTBQ100 AS VE
> JOIN V$DB2FTB.EGTBQ102 AS VP
> ON VE.MONAT = 12 AND VE.JAHR = 2006
> AND VP.MONAT = 12 AND VP.JAHR = 2006
> AND VE.V_VERW_SYS_SL IN ('AS','SV')
> AND VP.V_VERW_SYS_SL IN ('AS','SV')
> AND VE.V_VERW_SYS_SL = VP.V_VERW_SYS_SL
> AND VE.VNR = VP.VNR
> WHERE GES_SL = '0302'
> AND ZW_SL <> 'EB' AND G_KZ <> '2' AND KS_KZ <> '2'
> AND (V_STS_SL <> '7' OR V_END_DAT >= '31.12.2006' )
> AND (KUEND_VORM_DAT IS NULL OR KUEND_VORM_DAT >= '31.12.2006')
>
>So there are 2 join conditions, but there are two more, if transitive closure
>would have taken place. Because when I add the following 2 condtions:
>
>AND VE.JAHR = VP.JAHR AND VE.MONAT = VP.MONAT
>
>MERGE_JOIN_COLS jumps from 2 to 4. So according to your explanation,
there
>is no change in performance, because the 2 predicates are evaluated before
>the join, right?
>
>I thought, that the number of join columns could have any influence of how
>often DB2 has to jump from one merge stream to the other or how many
rows
>it can skip before the next comparison has to be made. Do you understand,
>what I mean? But then, it would be, that the more join columns the worse.
>What??

IMPORTANT NOTICE:

IDUG is pleased to announce a series of upgrades to the DB2-L discussion listserv that are being implemented to improve reliability and the overall user experience of DB2-L. These changes are coming on November 30th. Details at http://www.idug.org

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Terry Purcell

Re: [DB2 V8 NFM z/OS] MERGE_JOIN_COLS in PLAN_TABLE
(in response to Terry Purcell)
Hi Walter,

DB2 does do transitive closure of join predicates, but it does not convert local
predicates into join predicates, which is what you were looking for here.

What I mean is that if you have join predicates T1.C1 = T2.C1 AND T2.C1 =
T3.C1, then DB2 will also generate T1.C1 = T3.C1.

But what you are asking for is if T1.C1 = 1 AND T2.C1 = 1, then DB2 could
also generate T1.C1 = T2.C1. And that doesn't happen.

And basically as you point out, there is no huge benefit to doing this (one
exception is if there are no join predicates and NLJ is a poor choice). For SMJ
(sort merge join), the local predicates are applied before the join. So only
rows from VE with MONAT = 12 AND JAHR = 2006, and rows from VP with
MONAT = 12 AND JAHR = 2006, will survive to be joined. Having 2 additional
join predicates on these columns will only add overhead of extra predicates to
apply.

I don't see extra merge join predicates as an overhead for the merge join
phase. Sure it is a longer key to compare (which affects sort cost), but if the
keys are filtering then generally you want to apply the predicates as early as
possible. And after local predicates and merge join predicates, any other join
predicates (eg. join expressions, or non-boolean cross table predicates) are
applied after the join. And this is a greater overhead than applying them
during the merge.

Regards
Terry Purcell

On Fri, 23 Nov 2007 11:15:34 -0500, walter <[login to unmask email]>
wrote:

>Hi Terry
>
>Thank you for your reply. And your assumption was (surprisingly) right, if I
had
>exactly documented my superflous join condition. Indeed it were 2.
>What I noticed, is, that DB2 does no transitive closure for join predicates. Is
>that right?
>
>This was the basic query:
>
>SELECT *
> FROM V$DB2FTB.EGTBQ100 AS VE
> JOIN V$DB2FTB.EGTBQ102 AS VP
> ON VE.MONAT = 12 AND VE.JAHR = 2006
> AND VP.MONAT = 12 AND VP.JAHR = 2006
> AND VE.V_VERW_SYS_SL IN ('AS','SV')
> AND VP.V_VERW_SYS_SL IN ('AS','SV')
> AND VE.V_VERW_SYS_SL = VP.V_VERW_SYS_SL
> AND VE.VNR = VP.VNR
> WHERE GES_SL = '0302'
> AND ZW_SL <> 'EB' AND G_KZ <> '2' AND KS_KZ <> '2'
> AND (V_STS_SL <> '7' OR V_END_DAT >= '31.12.2006' )
> AND (KUEND_VORM_DAT IS NULL OR KUEND_VORM_DAT >= '31.12.2006')
>
>So there are 2 join conditions, but there are two more, if transitive closure
>would have taken place. Because when I add the following 2 condtions:
>
>AND VE.JAHR = VP.JAHR AND VE.MONAT = VP.MONAT
>
>MERGE_JOIN_COLS jumps from 2 to 4. So according to your explanation,
there
>is no change in performance, because the 2 predicates are evaluated before
>the join, right?
>
>I thought, that the number of join columns could have any influence of how
>often DB2 has to jump from one merge stream to the other or how many
rows
>it can skip before the next comparison has to be made. Do you understand,
>what I mean? But then, it would be, that the more join columns the worse
>What??

IMPORTANT NOTICE:

IDUG is pleased to announce a series of upgrades to the DB2-L discussion listserv that are being implemented to improve reliability and the overall user experience of DB2-L. These changes are coming on November 30th. Details at http://www.idug.org

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

IMPORTANT NOTICE:

IDUG is pleased to announce a series of upgrades to the DB2-L discussion listserv that are being implemented to improve reliability and the overall user experience of DB2-L. These changes are coming on November 30th. Details at http://www.idug.org

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm