Mad MAX goes again ....

Isaac Yassin

Mad MAX goes again ....
Hi,
If we are at it, take a look at the following query :

SELECT A.K_NUMBER
FROM MyTable A
WHERE A.K_NUMBER =
(SELECT MAX(B.K_NUMBER)
FROM MyTable B
WHERE A.YEAR=B.YEAR)
GROUP BY A.K_NUMBER ;

On a big table (with indexes) it takes on 9672-R45 : 3 MIN 51.43 SEC
CPU time

A small twist to:

SELECT A.K_NUMBER
FROM
(SELECT K_NUMBER,YEAR
FROM MyTable) AS A
INNER JOIN
(SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
FROM MyTable
GROUP BY YEAR) AS B
ON A.YEAR = B.YEAR
AND A.K_NUMBER=B.K_NUMBER
GROUP BY A.K_NUMBER ;

This one takes only - 1 MIN 58.24 SEC of CPU time.



Isaac Yassin
DBMS & IT Consultant

Email: [login to unmask email]

Tel : +972 9 9505172
Cell: +972 54 452793
Fax : +972 9 9560803



Sanjeev (CTS) S

Re: Mad MAX goes again ....
(in response to Walter Janißen)
Hi Issac,
Its interesting. It means that each and every Sqls should be tried and then
coded. Is the access path of the second query is with matchcols 2 and for
the first one it is with matchcols 1 for the subselect and then may be
matchcols 0 ?. What are the indexes and its columns ?. Just guessing few
things why it is so .

Regards
Sanjeev



> -----Original Message-----
> From: Isaac Yassin [SMTP:[login to unmask email]
> Sent: Thursday, December 07, 2000 11:19 PM
> To: [login to unmask email]
> Subject: Mad MAX goes again ....
>
> Hi,
> If we are at it, take a look at the following query :
>
> SELECT A.K_NUMBER
> FROM MyTable A
> WHERE A.K_NUMBER =
> (SELECT MAX(B.K_NUMBER)
> FROM MyTable B
> WHERE A.YEAR=B.YEAR)
> GROUP BY A.K_NUMBER ;
>
> On a big table (with indexes) it takes on 9672-R45 : 3 MIN 51.43 SEC
> CPU time
>
> A small twist to:
>
> SELECT A.K_NUMBER
> FROM
> (SELECT K_NUMBER,YEAR
> FROM MyTable) AS A
> INNER JOIN
> (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
> FROM MyTable
> GROUP BY YEAR) AS B
> ON A.YEAR = B.YEAR
> AND A.K_NUMBER=B.K_NUMBER
> GROUP BY A.K_NUMBER ;
>
> This one takes only - 1 MIN 58.24 SEC of CPU time.
>
>
>
> Isaac Yassin
> DBMS & IT Consultant
>
> Email: [login to unmask email]
>
> Tel : +972 9 9505172
> Cell: +972 54 452793
> Fax : +972 9 9560803
>
>
>
>
>
This e-mail and any files transmitted with it are for the sole use
of the intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and
destroy all copies of the original message. Any unauthorised review, use, disclosure,
dissemination, forwarding, printing or copying of this email or any action taken in
reliance on this e-mail is strictly prohibited and may be unlawful.

Visit us at http://www.cognizant.com



Walter Janißen

Re: Mad MAX goes again ....
(in response to Isaac Yassin)
Well, the first is a stage2-predicate, the second not, and stage2-
predicates all always more expensive than stage1.



Isaac Yassin

Re: Mad MAX goes again ....
(in response to Sanjeev (CTS) S)
Hi,
The point is that one is correlated and the other ain't.
Correlation costs you dearly, especially when the values need more the 4kb
to be stored - so you need to overlay them.
Using the second method you build ones a table of the sums and then join it.
It was meant only to show that you should always check for better solutions
even if you're sure that you have the right answer.
I've never seen it in any course material as they always teach the regular
correlation.

Isaac Yassin
DBMS & IT Consultant

Email: [login to unmask email]

Tel : +972 9 9505172
Cell: +972 54 452793
Fax : +972 9 9560803
----- Original Message -----
From: "Walter Janissen" <[login to unmask email]>
Newsgroups: bit.listserv.db2-l
To: <[login to unmask email]>
Sent: Friday, December 08, 2000 1:24 PM
Subject: Re: Mad MAX goes again ....


> Well, the first is a stage2-predicate, the second not, and stage2-
> predicates all always more expensive than stage1.
>
>
>


>



Venkat (PCA) Pillay

Re: Mad MAX goes again ....
(in response to Isaac Yassin)
Let me put my 2 cents worth -

The first query is correlated query with "=" condition, which usually
discouraged. A correlated subquery is more efficient with EXIST kind of
predicates.

SELECT A.K_NUMBER
FROM MyTable A
WHERE A.K_NUMBER =
(SELECT MAX(B.K_NUMBER) ------- gets executed as many time as many
rows in the table
FROM MyTable B
WHERE A.YEAR=B.YEAR)
GROUP BY A.K_NUMBER ;

The inner query is executed for each row of outer table, obviously even if
there is an index on YEAR,K_NUMBER this could be hog if colcard of K_NUMBER
is significantly greater than colcard for YEAR. If there is no index then it
is worst.

SELECT A.K_NUMBER
FROM
(SELECT K_NUMBER,YEAR
FROM MyTable) AS A
INNER JOIN
(SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR ------ one scan on the
table
FROM MyTable
GROUP BY YEAR) AS B
ON A.YEAR = B.YEAR
AND A.K_NUMBER=B.K_NUMBER
GROUP BY A.K_NUMBER ;


By converting it to join what is done is one scan for inner query and
creation of small result set has made the access much faster.

Basic rule of thumb here is to use EXIST for correlated subquery, whereever
possible and use "=" for non correlated subquery.
If you like to use "=" with correlated subquery then watch out for following
ratio carcard K_NUMBER/colcard for YEAR.

Here is some thing more intersting stuff (slightly different topic that
above correlated subquery) on whether to self join a table or not.

If you look at Michael Hannan's 1999 presentation. He has described this in
detail. "When to Join a Table onto Itself".
Let me write those points here -

In extreme cases ONLY - where cost of passing Selected Cols to Stage 2 is
higher than performing join to same row using Cluster Index Cols and
predicate does very strong filtering.


Regards,
Venkat Pillay

> -----Original Message-----
> From: Isaac Yassin [SMTP:[login to unmask email]
> Sent: Friday, December 08, 2000 1:16 PM
> To: [login to unmask email]
> Subject: Re: Mad MAX goes again ....
>
> Hi,
> The point is that one is correlated and the other ain't.
> Correlation costs you dearly, especially when the values need more the 4kb
> to be stored - so you need to overlay them.
> Using the second method you build ones a table of the sums and then join
> it.
> It was meant only to show that you should always check for better
> solutions
> even if you're sure that you have the right answer.
> I've never seen it in any course material as they always teach the regular
> correlation.
>
> Isaac Yassin
> DBMS & IT Consultant
>
> Email: [login to unmask email]
>
> Tel : +972 9 9505172
> Cell: +972 54 452793
> Fax : +972 9 9560803
> ----- Original Message -----
> From: "Walter Janissen" <[login to unmask email]>
> Newsgroups: bit.listserv.db2-l
> To: <[login to unmask email]>
> Sent: Friday, December 08, 2000 1:24 PM
> Subject: Re: Mad MAX goes again ....
>
>
> > Well, the first is a stage2-predicate, the second not, and stage2-
> > predicates all always more expensive than stage1.
> >
> >
> >
>
>
> >
>
>
>
>
>



Venkat (PCA) Pillay

Re: Mad MAX goes again ....
(in response to Venkat (PCA) Pillay)
Correction to my earlier e-mail. -

In the earlier mail I typed following statement wrong -
"this could be hog if colcard of K_NUMBER is significantly greater than
colcard for YEAR"
In fact, it is the opposite.

The correlated subquery uses "In memory table of 4K size" to keep immediate
result set. If ratio (colcard K_NUMBER/colcard YEAR) is high and the table
is clustered by YEAR then correlated subquery is very good choice because
the results are kept in the memory table and no extra access is necessary.

DB2 also does not use memory table under the following circumstances:
- SUM of length of correlated columns is more than 256 bytes.
- There is an unique index on a subset of the correlated column of a table
from the outer query
and if there are more than 16 columns used for correlation.

Obviously in Issac's case either there is an unique index or 4k table is not
used effectively causing a grief. Therefore the join appears to be a better
choice.

However given a choice EXISTS is the usually a right choice for correlated
subquery and "=" is discouraged.

Regards,
Venkat Pillay

> -----Original Message-----
> From: Pillay, Venkat (PCA)
> Sent: Friday, December 08, 2000 2:03 PM
> To: 'DB2 Data Base Discussion List'
> Subject: RE: Mad MAX goes again ....
>
> Let me put my 2 cents worth -
>
> The first query is correlated query with "=" condition, which usually
> discouraged. A correlated subquery is more efficient with EXIST kind of
> predicates.
>
> SELECT A.K_NUMBER
> FROM MyTable A
> WHERE A.K_NUMBER =
> (SELECT MAX(B.K_NUMBER) ------- gets executed as many time as
> many rows in the table
> FROM MyTable B
> WHERE A.YEAR=B.YEAR)
> GROUP BY A.K_NUMBER ;
>
> The inner query is executed for each row of outer table, obviously even if
> there is an index on YEAR,K_NUMBER this could be hog if colcard of
> K_NUMBER is significantly greater than colcard for YEAR. If there is no
> index then it is worst.
>
> SELECT A.K_NUMBER
> FROM
> (SELECT K_NUMBER,YEAR
> FROM MyTable) AS A
> INNER JOIN
> (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR ------ one scan on the
> table
> FROM MyTable
> GROUP BY YEAR) AS B
> ON A.YEAR = B.YEAR
> AND A.K_NUMBER=B.K_NUMBER
> GROUP BY A.K_NUMBER ;
>
>
> By converting it to join what is done is one scan for inner query and
> creation of small result set has made the access much faster.
>
> Basic rule of thumb here is to use EXIST for correlated subquery,
> whereever possible and use "=" for non correlated subquery.
> If you like to use "=" with correlated subquery then watch out for
> following ratio carcard K_NUMBER/colcard for YEAR.
>
> Here is some thing more intersting stuff (slightly different topic that
> above correlated subquery) on whether to self join a table or not.
>
> If you look at Michael Hannan's 1999 presentation. He has described this
> in detail. "When to Join a Table onto Itself".
> Let me write those points here -
>
> In extreme cases ONLY - where cost of passing Selected Cols to Stage 2 is
> higher than performing join to same row using Cluster Index Cols and
> predicate does very strong filtering.
>
>
> Regards,
> Venkat Pillay
>
> -----Original Message-----
> From: Isaac Yassin [SMTP:[login to unmask email]
> Sent: Friday, December 08, 2000 1:16 PM
> To: [login to unmask email]
> Subject: Re: Mad MAX goes again ....
>
> Hi,
> The point is that one is correlated and the other ain't.
> Correlation costs you dearly, especially when the values need more the 4kb
> to be stored - so you need to overlay them.
> Using the second method you build ones a table of the sums and then join
> it.
> It was meant only to show that you should always check for better
> solutions
> even if you're sure that you have the right answer.
> I've never seen it in any course material as they always teach the regular
> correlation.
>
> Isaac Yassin
> DBMS & IT Consultant
>
> Email: [login to unmask email]
>
> Tel : +972 9 9505172
> Cell: +972 54 452793
> Fax : +972 9 9560803
> ----- Original Message -----
> From: "Walter Janissen" <[login to unmask email]>
> Newsgroups: bit.listserv.db2-l
> To: <[login to unmask email]>
> Sent: Friday, December 08, 2000 1:24 PM
> Subject: Re: Mad MAX goes again ....
>
>
> > Well, the first is a stage2-predicate, the second not, and stage2-
> > predicates all always more expensive than stage1.
> >
> >
> >
>
>
> >
>
>
>
>
>



Isaac Yassin

Re: Mad MAX goes again ....
(in response to Venkat (PCA) Pillay)
Hi,
Let's take a look at the 4kb memory table.
Assuming an INT field for the group by and another INT for the SUM you get 8
bytes.
4000/8=500 rows (ok, I cut the corner here). and if you have more fields on
the group - it's even less.
So , if you have more "groups" then that - you start to loose when you
overlay them.

I would like to have DB2 build a bigger "in memory table" as this could save
a lot.
I would go for up to 32 pages or even make it a DSNZPARM parm.

Isaac Yassin
DBMS & IT Consultant

Email: [login to unmask email]

Tel : +972 9 9505172
Cell: +972 54 452793
Fax : +972 9 9560803
----- Original Message -----
From: "Pillay, Venkat (PCA)" <[login to unmask email]>
Newsgroups: bit.listserv.db2-l
To: <[login to unmask email]>
Sent: Friday, December 08, 2000 9:26 PM
Subject: Re: Mad MAX goes again ....


> Correction to my earlier e-mail. -
>
> In the earlier mail I typed following statement wrong -
> "this could be hog if colcard of K_NUMBER is significantly greater than
> colcard for YEAR"
> In fact, it is the opposite.
>
> The correlated subquery uses "In memory table of 4K size" to keep
immediate
> result set. If ratio (colcard K_NUMBER/colcard YEAR) is high and the
table
> is clustered by YEAR then correlated subquery is very good choice because
> the results are kept in the memory table and no extra access is necessary.
>
> DB2 also does not use memory table under the following circumstances:
> - SUM of length of correlated columns is more than 256 bytes.
> - There is an unique index on a subset of the correlated column of a table
> from the outer query
> and if there are more than 16 columns used for correlation.
>
> Obviously in Issac's case either there is an unique index or 4k table is
not
> used effectively causing a grief. Therefore the join appears to be a
better
> choice.
>
> However given a choice EXISTS is the usually a right choice for correlated
> subquery and "=" is discouraged.
>
> Regards,
> Venkat Pillay
>
> > -----Original Message-----
> > From: Pillay, Venkat (PCA)
> > Sent: Friday, December 08, 2000 2:03 PM
> > To: 'DB2 Data Base Discussion List'
> > Subject: RE: Mad MAX goes again ....
> >
> > Let me put my 2 cents worth -
> >
> > The first query is correlated query with "=" condition, which usually
> > discouraged. A correlated subquery is more efficient with EXIST kind of
> > predicates.
> >
> > SELECT A.K_NUMBER
> > FROM MyTable A
> > WHERE A.K_NUMBER =
> > (SELECT MAX(B.K_NUMBER) ------- gets executed as many time as
> > many rows in the table
> > FROM MyTable B
> > WHERE A.YEAR=B.YEAR)
> > GROUP BY A.K_NUMBER ;
> >
> > The inner query is executed for each row of outer table, obviously even
if
> > there is an index on YEAR,K_NUMBER this could be hog if colcard of
> > K_NUMBER is significantly greater than colcard for YEAR. If there is no
> > index then it is worst.
> >
> > SELECT A.K_NUMBER
> > FROM
> > (SELECT K_NUMBER,YEAR
> > FROM MyTable) AS A
> > INNER JOIN
> > (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR ------ one scan on the
> > table
> > FROM MyTable
> > GROUP BY YEAR) AS B
> > ON A.YEAR = B.YEAR
> > AND A.K_NUMBER=B.K_NUMBER
> > GROUP BY A.K_NUMBER ;
> >
> >
> > By converting it to join what is done is one scan for inner query and
> > creation of small result set has made the access much faster.
> >
> > Basic rule of thumb here is to use EXIST for correlated subquery,
> > whereever possible and use "=" for non correlated subquery.
> > If you like to use "=" with correlated subquery then watch out for
> > following ratio carcard K_NUMBER/colcard for YEAR.
> >
> > Here is some thing more intersting stuff (slightly different topic that
> > above correlated subquery) on whether to self join a table or not.
> >
> > If you look at Michael Hannan's 1999 presentation. He has described this
> > in detail. "When to Join a Table onto Itself".
> > Let me write those points here -
> >
> > In extreme cases ONLY - where cost of passing Selected Cols to Stage 2
is
> > higher than performing join to same row using Cluster Index Cols and
> > predicate does very strong filtering.
> >
> >
> > Regards,
> > Venkat Pillay
> >
> > -----Original Message-----
> > From: Isaac Yassin [SMTP:[login to unmask email]
> > Sent: Friday, December 08, 2000 1:16 PM
> > To: [login to unmask email]
> > Subject: Re: Mad MAX goes again ....
> >
> > Hi,
> > The point is that one is correlated and the other ain't.
> > Correlation costs you dearly, especially when the values need more the
4kb
> > to be stored - so you need to overlay them.
> > Using the second method you build ones a table of the sums and then join
> > it.
> > It was meant only to show that you should always check for better
> > solutions
> > even if you're sure that you have the right answer.
> > I've never seen it in any course material as they always teach the
regular
> > correlation.
> >
> > Isaac Yassin
> > DBMS & IT Consultant
> >
> > Email: [login to unmask email]
> >
> > Tel : +972 9 9505172
> > Cell: +972 54 452793
> > Fax : +972 9 9560803
> > ----- Original Message -----
> > From: "Walter Janissen" <[login to unmask email]>
> > Newsgroups: bit.listserv.db2-l
> > To: <[login to unmask email]>
> > Sent: Friday, December 08, 2000 1:24 PM
> > Subject: Re: Mad MAX goes again ....
> >
> >
> > > Well, the first is a stage2-predicate, the second not, and stage2-
> > > predicates all always more expensive than stage1.
> > >
> > >
> > >
visit
> > the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the list
can
> >
> > >
> >
> >
> >
> > the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the list
can
> >
>
>
>


>



Arli (SG-SEMA) Salim

Re: Mad MAX goes again ....
(in response to Isaac Yassin)
Hi Isaac & other Mad MAXes,

I'm a new dba but am prepared to be shot down by you all,
on or off the list.

I feel that there's a better SQL to do the same job.
Could any of you confirm on the equivalency and relative
performance of this one:

select distinct max(k_number)
from MyTable
group by year;

Cheers,
arli

> > SELECT A.K_NUMBER
> > FROM MyTable A
> > WHERE A.K_NUMBER =
> > (SELECT MAX(B.K_NUMBER)
> > FROM MyTable B
> > WHERE A.YEAR=B.YEAR)
> > GROUP BY A.K_NUMBER ;
> >
> > SELECT A.K_NUMBER
> > FROM
> > (SELECT K_NUMBER,YEAR
> > FROM MyTable) AS A
> > INNER JOIN
> > (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
> > FROM MyTable
> > GROUP BY YEAR) AS B
> > ON A.YEAR = B.YEAR
> > AND A.K_NUMBER=B.K_NUMBER
> > GROUP BY A.K_NUMBER ;
> >
> >

[login to unmask email]

Re: Mad MAX goes again ....
(in response to Arli (SG-SEMA) Salim)
Yes this will work since by adding the 'GROUP BY' clause, DB2 returns a row
for 'year' even though it is not the 'select' portion.

Dean





"Salim, Arli (SG-SEMA)" <[login to unmask email]>@RYCI.COM> on 12/12/2000
11:04:12 PM

Please respond to DB2 Data Base Discussion List <[login to unmask email]>

Sent by: DB2 Data Base Discussion List <[login to unmask email]>


To: [login to unmask email]
cc:

Subject: Re: Mad MAX goes again ....




Hi Isaac & other Mad MAXes,

I'm a new dba but am prepared to be shot down by you all,
on or off the list.

I feel that there's a better SQL to do the same job.
Could any of you confirm on the equivalency and relative
performance of this one:

select distinct max(k_number)
from MyTable
group by year;

Cheers,
arli

> >  SELECT A.K_NUMBER
> >  FROM   MyTable      A
> >  WHERE  A.K_NUMBER =
> >        (SELECT MAX(B.K_NUMBER)
> >         FROM   MyTable     B
> >         WHERE  A.YEAR=B.YEAR)
> >  GROUP BY A.K_NUMBER ;
> >
> > SELECT A.K_NUMBER
> > FROM
> >        (SELECT K_NUMBER,YEAR
> >         FROM   MyTable) AS A
> >  INNER JOIN
> >        (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
> >         FROM   MyTable
> >         GROUP BY YEAR) AS B
> >  ON   A.YEAR = B.YEAR
> >  AND  A.K_NUMBER=B.K_NUMBER
> >  GROUP BY A.K_NUMBER ;
> >
> >





Venkat (PCA) Pillay

Re: Mad MAX goes again ....
(in response to DB46@DAIMLERCHRYSLER.COM)
Of course you are right. It is really simple and there is no need to twist
and test the query. But what we were discussing was the
efficient/inefficient use of subquery using 4k memory table, example is
immaterial.

> -----Original Message-----
> From: Salim, Arli (SG-SEMA) [SMTP:[login to unmask email]
> Sent: Tuesday, December 12, 2000 11:04 PM
> To: [login to unmask email]
> Subject: Re: Mad MAX goes again ....
>
> Hi Isaac & other Mad MAXes,
>
> I'm a new dba but am prepared to be shot down by you all,
> on or off the list.
>
> I feel that there's a better SQL to do the same job.
> Could any of you confirm on the equivalency and relative
> performance of this one:
>
> select distinct max(k_number)
> from MyTable
> group by year;
>
> Cheers,
> arli
>
> > > SELECT A.K_NUMBER
> > > FROM MyTable A
> > > WHERE A.K_NUMBER =
> > > (SELECT MAX(B.K_NUMBER)
> > > FROM MyTable B
> > > WHERE A.YEAR=B.YEAR)
> > > GROUP BY A.K_NUMBER ;
> > >
> > > SELECT A.K_NUMBER
> > > FROM
> > > (SELECT K_NUMBER,YEAR
> > > FROM MyTable) AS A
> > > INNER JOIN
> > > (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
> > > FROM MyTable
> > > GROUP BY YEAR) AS B
> > > ON A.YEAR = B.YEAR
> > > AND A.K_NUMBER=B.K_NUMBER
> > > GROUP BY A.K_NUMBER ;
> > >
> > >
>



Arli (SG-SEMA) Salim

Re: Mad MAX goes again ....
(in response to Venkat (PCA) Pillay)
Hi Venkat,

I see what you're saying. The SQLs happened to intrigue me
since they were run on a machine; so presumably, were taken
from a real life application.

Does the 4K need to be increased? It'd be very interesting
to find out what the optimal size should be and, if any,
supporting evidence for it.

I learned a great deal from the excellent explanations given
by you all. Many thanks.

Cheers,
arli

-----Original Message-----
From: Pillay, Venkat (PCA) [mailto:[login to unmask email]
Sent: Thursday, December 14, 2000 4:08 AM
To: [login to unmask email]
Subject: Re: Mad MAX goes again ....


Of course you are right. It is really simple and there is no need to twist
and test the query. But what we were discussing was the
efficient/inefficient use of subquery using 4k memory table, example is
immaterial.

Terry Purcell

Re: Mad MAX goes again ....
(in response to Arli (SG-SEMA) Salim)
Arli & Venkat,

I had been off the list and unfortunately missed the trail on this.

The 4K cache is the limit and there is no way to alter the size.

With the combination of the DISTINCT and GROUP BY, the DISTINCT is redundant
and therefore discarded by the optimizer.

Also, the comparison of the original two queries (I know it was a logical
comparison); the main difference between the GROUP BY expression and the
correlated subquery is as follows:

Both examples require the entire table of MYTABLE to be processed, and the
most efficient method is a sequential access method. The correlated subquery
requires all rows of the outer query to be read, and every row passed to the
inner subquery (of course repeated executions can be reduced due to the 4K
correlation cache, dependant on random nature of inner table sequence). The
inner table accesses MAY benefit from index lookaside and dynamic prefetch,
but even index lookaside requires access to every index non-leaf page once
the end of each leaf page is reached.

Alternatively, the GROUP BY will utilize pure sequential prefetch (rather
than the less efficient dynamic prefetch, which is still more efficient than
sync I/O), and will not require getpages on any non-leaf pages, as the
non-matching index scan (assuming this is the access path chosen) scan leaf
pages only. The GROUP BY may avoid a sort due to an available index, but
will materialize the NTE. The inner join to the other NTE will perform a NTE
merge so that there is only one materialisation (even though the query
visually contains 2 NTEs)..........

The upshot is that when processing the entire result set, a pure random
approach involving a GROUP BY encapsulated within an NTE can easily
outperform a correlated subquery which must pass every row to stage 2 to
either attempt the subquery, or match the correlation predicates and
TRUE/FALSE result in the correlation cache.

In addition to these two methods, there is an additional coding method which
can retrieve both summary and detail columns within a single query block
(provided an index can avoid a sort), rather than the two examples shown
which require two table accesses.

You can check this out at YL&A's Advanced & Complex SQL courses, or I can
pass this onto the list if required.

Regards
Terry Purcell
[login to unmask email]
Yevich Lawson & Associates
WWW.YLASSOC.COM
WWW.DB2-PORTAL.COM
WWW.DB2-SYMPOSIUM.COM

-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email]On Behalf Of
Pillay, Venkat (PCA)
Sent: Wednesday, December 13, 2000 2:08 PM
To: [login to unmask email]
Subject: Re: Mad MAX goes again ....


Of course you are right. It is really simple and there is no need to twist
and test the query. But what we were discussing was the
efficient/inefficient use of subquery using 4k memory table, example is
immaterial.

> -----Original Message-----
> From: Salim, Arli (SG-SEMA) [SMTP:[login to unmask email]
> Sent: Tuesday, December 12, 2000 11:04 PM
> To: [login to unmask email]
> Subject: Re: Mad MAX goes again ....
>
> Hi Isaac & other Mad MAXes,
>
> I'm a new dba but am prepared to be shot down by you all,
> on or off the list.
>
> I feel that there's a better SQL to do the same job.
> Could any of you confirm on the equivalency and relative
> performance of this one:
>
> select distinct max(k_number)
> from MyTable
> group by year;
>
> Cheers,
> arli
>
> > > SELECT A.K_NUMBER
> > > FROM MyTable A
> > > WHERE A.K_NUMBER =
> > > (SELECT MAX(B.K_NUMBER)
> > > FROM MyTable B
> > > WHERE A.YEAR=B.YEAR)
> > > GROUP BY A.K_NUMBER ;
> > >
> > > SELECT A.K_NUMBER
> > > FROM
> > > (SELECT K_NUMBER,YEAR
> > > FROM MyTable) AS A
> > > INNER JOIN
> > > (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
> > > FROM MyTable
> > > GROUP BY YEAR) AS B
> > > ON A.YEAR = B.YEAR
> > > AND A.K_NUMBER=B.K_NUMBER
> > > GROUP BY A.K_NUMBER ;
> > >
> > >
>








Sanjeev (CTS) S

Re: Mad MAX goes again ....
(in response to Terry Purcell)
Hi Arli,
What if i point out something which may be wrong ? "someone will point me
in the right direction, that's good".I sent this mail yesterday as well but
it bounced back because of change in my e-mail address. Thanks to the list
owner for changing the address.

I think that the result set "may" be different due to the grouping you have
done on year.In the sqls mentioned in the Issac's mail you can miss the
records of some years if the max(K_number ) is same for the some of the
years.In your sql , u will be getting those as well.I hope i have understood
the things clearly.
Regarding performance, i would say checking all the possiblilities
depending on all the stats and choose the better one.

HTH
Regards
Sanjeev



> -----Original Message-----
> From: Salim, Arli (SG-SEMA) [SMTP:[login to unmask email]
> Sent: Wednesday, December 13, 2000 9:34 AM
> To: [login to unmask email]
> Subject: Re: Mad MAX goes again ....
>
> Hi Isaac & other Mad MAXes,
>
> I'm a new dba but am prepared to be shot down by you all,
> on or off the list.
>
> I feel that there's a better SQL to do the same job.
> Could any of you confirm on the equivalency and relative
> performance of this one:
>
> select distinct max(k_number)
> from MyTable
> group by year;
>
> Cheers,
> arli
>
> > > SELECT A.K_NUMBER
> > > FROM MyTable A
> > > WHERE A.K_NUMBER =
> > > (SELECT MAX(B.K_NUMBER)
> > > FROM MyTable B
> > > WHERE A.YEAR=B.YEAR)
> > > GROUP BY A.K_NUMBER ;
> > >
> > > SELECT A.K_NUMBER
> > > FROM
> > > (SELECT K_NUMBER,YEAR
> > > FROM MyTable) AS A
> > > INNER JOIN
> > > (SELECT MAX(K_NUMBER) AS K_NUMBER,YEAR
> > > FROM MyTable
> > > GROUP BY YEAR) AS B
> > > ON A.YEAR = B.YEAR
> > > AND A.K_NUMBER=B.K_NUMBER
> > > GROUP BY A.K_NUMBER ;
> > >
> > >
>
This e-mail and any files transmitted with it are for the sole use
of the intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please contact the sender by reply e-mail and
destroy all copies of the original message. Any unauthorised review, use, disclosure,
dissemination, forwarding, printing or copying of this email or any action taken in
reliance on this e-mail is strictly prohibited and may be unlawful.

Visit us at http://www.cognizant.com