Re: Clustering affects sorts

Michael Hannan

Re: Clustering affects sorts
(in response to ssethi@LOT.TATASTEEL.COM)
Helen,

DB2 could potentially do any sort it felt like to eliminate duplicates, but
it just so happens that it sorts on columns in sequence specified, ascending
on all of them.

Note however that non-correlated IN subqueries sort for uniqueness into
descending sequence and only 1 sort used despite 2 sort indicators in the
Explain.

Logically and theoretically ORDER BY is required to guarantee the sequence
that you want.

Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
are unique when a subset of columns forms a unique index.
In this case rows could be in any order at all.
ORDER BY is definitely needed here both theoretically and practically.

DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
in general. I often avoid DISTINCT sorts (on single tables) by trying to
include a full unique key, even though I may wish to use DISTINCT to force
materialisation of an intermediate result set.

Otherwise try to avoid using DISTINCT as far as possible.
I am mostly not keen on the performance of DISTINCTs that cause extra sorts.

So in practice there may be cases where you can guarantee that DB2 will do a
DISTINCT sort and where ORDER BY will make no difference other than worse
performance. So although theoretically incorrect, in practice it may work.
For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
that the ORDER BY would be a waste of time.
Untheoretical expediency sometimes influences our judgement.

Commenting on the original.
Clearly poor clustering (90% is already poor, 95% moderate) will strongly
encourage a TS Scan with Sort in preference to Sort Avoidance using the
index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
fetched. The cost of full scan of a poorly clustered index is far too much I/O.

From: Michael Hannan

>From: Helen Johnson <[login to unmask email]>
>Subject: Re: Clustering affects sorts
>To: [login to unmask email]
>
>I have always told programmers and users to use ORDER BY to be certain of
the data order. But what about when the SQL contains a DISTINCT and the
columns are listed in the same order as
>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
and ORDER BY), wouldn't the ORDER BY sort be redundant?
>
>Helen
>
>[login to unmask email] wrote:
>
>> Hi,
>>
>> Although sorts can be avoided by use of an index (clustered or
>> non-clustered) in the correct sequence, you should ALWAYS include
>> an ORDER BY clause if you want the data in a certain order. DB2
>> could always switch to a different index which may require a sort
>> or return the data in a different order if the ORDER BY is omitted.
>>
>> Regards,
>>
>> Michael Levine
>> Premier Data Services, Inc.
>>
>> >Hi all,
>> > I was going through the article of Bonnie Baker in the latest edition
>> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
>> >.
>> > It raised a question in my mind that up to what extent the
>> >clustering of an index affects the avoidance of an ORDER BY clause except
>> >that LIST PREFETCH will not occur.
>> >
>> >Thanks in advance for any help...
>> >
>> >Sanjeev
>> >
>> >
>> >
>>
>>
>> >
>> >
>>
>>
>>


>
>
>


>



Michael Hannan

Re: Clustering affects sorts
(in response to Helen Johnson)
Jim,

It looks like DB2 does avoid the 2nd sort in many cases based on a very
quick test run. Not sure that this was always the case.

My test appeared to show no measurable extra cost for ORDER BY on top of
DISTINCT. I'll try to do more definitive tests shortly.

There are some cases where DB2 does sort into same order twice, but this may
not be one of them.

From: Michael Hannan

>From: "Chie, Jim (C)" <[login to unmask email]>
>Subject: Re: Clustering affects sorts
>To: [login to unmask email]
>
>I wonder is there a real cost to the second sort, or does DB2 simply call
>the sort routine which says "I'm already sorted so leave me alone", but it
>still shows as a sort in the explain table.
>
>-----Original Message-----
>From: Michael Hannan [mailto:[login to unmask email]
>Sent: Saturday, December 17, 1994 8:52 AM
>To: [login to unmask email]
>Subject: Re: Clustering affects sorts
>
>
>Helen,
>
>DB2 could potentially do any sort it felt like to eliminate duplicates, but
>it just so happens that it sorts on columns in sequence specified, ascending
>on all of them.
>
>Note however that non-correlated IN subqueries sort for uniqueness into
>descending sequence and only 1 sort used despite 2 sort indicators in the
>Explain.
>
>Logically and theoretically ORDER BY is required to guarantee the sequence
>that you want.
>
>Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
>are unique when a subset of columns forms a unique index.
>In this case rows could be in any order at all.
>ORDER BY is definitely needed here both theoretically and practically.
>
>DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
>in general. I often avoid DISTINCT sorts (on single tables) by trying to
>include a full unique key, even though I may wish to use DISTINCT to force
>materialisation of an intermediate result set.
>
>Otherwise try to avoid using DISTINCT as far as possible.
>I am mostly not keen on the performance of DISTINCTs that cause extra sorts.
>
>So in practice there may be cases where you can guarantee that DB2 will do a
>DISTINCT sort and where ORDER BY will make no difference other than worse
>performance. So although theoretically incorrect, in practice it may work.
>For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
>happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
>that the ORDER BY would be a waste of time.
>Untheoretical expediency sometimes influences our judgement.
>
>Commenting on the original.
>Clearly poor clustering (90% is already poor, 95% moderate) will strongly
>encourage a TS Scan with Sort in preference to Sort Avoidance using the
>index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
>fetched. The cost of full scan of a poorly clustered index is far too much
>I/O.
>
>From: Michael Hannan
>
>>From: Helen Johnson <[login to unmask email]>
>>Subject: Re: Clustering affects sorts
>>To: [login to unmask email]
>>
>>I have always told programmers and users to use ORDER BY to be certain of
>the data order. But what about when the SQL contains a DISTINCT and the
>columns are listed in the same order as
>>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
>and ORDER BY), wouldn't the ORDER BY sort be redundant?
>>
>>Helen
>>
>>[login to unmask email] wrote:
>>
>>> Hi,
>>>
>>> Although sorts can be avoided by use of an index (clustered or
>>> non-clustered) in the correct sequence, you should ALWAYS include
>>> an ORDER BY clause if you want the data in a certain order. DB2
>>> could always switch to a different index which may require a sort
>>> or return the data in a different order if the ORDER BY is omitted.
>>>
>>> Regards,
>>>
>>> Michael Levine
>>> Premier Data Services, Inc.
>>>
>>> >Hi all,
>>> > I was going through the article of Bonnie Baker in the latest
>edition
>>> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER
>BY
>>> >.
>>> > It raised a question in my mind that up to what extent the
>>> >clustering of an index affects the avoidance of an ORDER BY clause
>except
>>> >that LIST PREFETCH will not occur.
>>> >
>>> >Thanks in advance for any help...
>>> >
>>> >Sanjeev
>>> >
>>> >
>>> >
>>> the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the list
>can
>>>
>>> >
>>> >
>>>
>>>
>>>
>
>
>>
>>
>>
>
>
>>
>
>
>
>
>
>
>
>


>



Michael Hannan

Re: Clustering affects sorts
(in response to Michael Hannan)
Helen,

I would like to take back my previous answer.

My tests seem to indicate now that once SELECT DISTINCT is coded and causes
a DB2 RDS Sort, additional ORDER BY takes no appreciable additional CPU and
presumeably utilises just one sort to do both DISTINCT and ORDER BY.

My test included a join with Selected columns from both tables and ORDER BY
columns 2, 3, 1. CPU roughly same with or without ORDER BY.

I previously had not noticed this except in non-correlated IN subqueries
doing single sort for both Uniqueness and for Order By.

So therefore by all means code both DISTINCT and ORDER BY.
No need to assume ordering from DISTINCT.

From: Michael Hannan

>Date: Sun, 18 Dec 1994 02:52:10 +1000
>To: [login to unmask email]
>From: Michael Hannan <[login to unmask email]>
>Subject: Re: Clustering affects sorts
>
>Helen,
>
>DB2 could potentially do any sort it felt like to eliminate duplicates, but
it just so happens that it sorts on columns in sequence specified, ascending
on all of them.
>
>Note however that non-correlated IN subqueries sort for uniqueness into
descending sequence and only 1 sort used despite 2 sort indicators in the
Explain.
>
>Logically and theoretically ORDER BY is required to guarantee the sequence
that you want.
>
>Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
are unique when a subset of columns forms a unique index.
>In this case rows could be in any order at all.
>ORDER BY is definitely needed here both theoretically and practically.
>
>DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
in general. I often avoid DISTINCT sorts (on single tables) by trying to
include a full unique key, even though I may wish to use DISTINCT to force
materialisation of an intermediate result set.
>
>Otherwise try to avoid using DISTINCT as far as possible.
>I am mostly not keen on the performance of DISTINCTs that cause extra sorts.
>
>So in practice there may be cases where you can guarantee that DB2 will do
a DISTINCT sort and where ORDER BY will make no difference other than worse
performance. So although theoretically incorrect, in practice it may work.
>For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
that the ORDER BY would be a waste of time.
>Untheoretical expediency sometimes influences our judgement.
>
>Commenting on the original.
>Clearly poor clustering (90% is already poor, 95% moderate) will strongly
encourage a TS Scan with Sort in preference to Sort Avoidance using the
index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
fetched. The cost of full scan of a poorly clustered index is far too much I/O.
>
>From: Michael Hannan
>
>>From: Helen Johnson <[login to unmask email]>
>>Subject: Re: Clustering affects sorts
>>To: [login to unmask email]
>>
>>I have always told programmers and users to use ORDER BY to be certain of
the data order. But what about when the SQL contains a DISTINCT and the
columns are listed in the same order as
>>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
and ORDER BY), wouldn't the ORDER BY sort be redundant?
>>
>>Helen
>>
>>[login to unmask email] wrote:
>>
>>> Hi,
>>>
>>> Although sorts can be avoided by use of an index (clustered or
>>> non-clustered) in the correct sequence, you should ALWAYS include
>>> an ORDER BY clause if you want the data in a certain order. DB2
>>> could always switch to a different index which may require a sort
>>> or return the data in a different order if the ORDER BY is omitted.
>>>
>>> Regards,
>>>
>>> Michael Levine
>>> Premier Data Services, Inc.
>>>
>>> >Hi all,
>>> > I was going through the article of Bonnie Baker in the latest edition
>>> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
>>> >.
>>> > It raised a question in my mind that up to what extent the
>>> >clustering of an index affects the avoidance of an ORDER BY clause except
>>> >that LIST PREFETCH will not occur.
>>> >
>>> >Thanks in advance for any help...
>>> >
>>> >Sanjeev
>>> >
>>> >
>>> >
>>>
>>>
>>> >
>>> >
>>>
>>>
>>>


>>
>>
>>


>>
>



Michael Hannan

Re: Clustering affects sorts
(in response to Helen Johnson)
ORDER BY 2, 3, 1
was different to DISTINCT 1, 2, 3
but logically the sequence of columns for DISTINCT is unimportant, so I
guess DB2 might have internally converted to to DISTINCT 2, 3, 1 to match
the ORDER BY.

From: Michael Hannan
>From: Helen Johnson <[login to unmask email]>
>Subject: Re: Clustering affects sorts
>To: [login to unmask email]
>
>Michael,
>
>In your tests, was the order of the DISTINCT columns different from the
order of the ORDER BY columns, or were both the same? Thanks,
>
>Helen
>
>Michael Hannan wrote:
>
>> Helen,
>>
>> I would like to take back my previous answer.
>>
>> My tests seem to indicate now that once SELECT DISTINCT is coded and causes
>> a DB2 RDS Sort, additional ORDER BY takes no appreciable additional CPU and
>> presumeably utilises just one sort to do both DISTINCT and ORDER BY.
>>
>> My test included a join with Selected columns from both tables and ORDER BY
>> columns 2, 3, 1. CPU roughly same with or without ORDER BY.
>>
>> I previously had not noticed this except in non-correlated IN subqueries
>> doing single sort for both Uniqueness and for Order By.
>>
>> So therefore by all means code both DISTINCT and ORDER BY.
>> No need to assume ordering from DISTINCT.
>>
>> From: Michael Hannan
>>
>> >Date: Sun, 18 Dec 1994 02:52:10 +1000
>> >To: [login to unmask email]
>> >From: Michael Hannan <[login to unmask email]>
>> >Subject: Re: Clustering affects sorts
>> >
>> >Helen,
>> >
>> >DB2 could potentially do any sort it felt like to eliminate duplicates, but
>> it just so happens that it sorts on columns in sequence specified, ascending
>> on all of them.
>> >
>> >Note however that non-correlated IN subqueries sort for uniqueness into
>> descending sequence and only 1 sort used despite 2 sort indicators in the
>> Explain.
>> >
>> >Logically and theoretically ORDER BY is required to guarantee the sequence
>> that you want.
>> >
>> >Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
>> are unique when a subset of columns forms a unique index.
>> >In this case rows could be in any order at all.
>> >ORDER BY is definitely needed here both theoretically and practically.
>> >
>> >DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
>> in general. I often avoid DISTINCT sorts (on single tables) by trying to
>> include a full unique key, even though I may wish to use DISTINCT to force
>> materialisation of an intermediate result set.
>> >
>> >Otherwise try to avoid using DISTINCT as far as possible.
>> >I am mostly not keen on the performance of DISTINCTs that cause extra sorts.
>> >
>> >So in practice there may be cases where you can guarantee that DB2 will do
>> a DISTINCT sort and where ORDER BY will make no difference other than worse
>> performance. So although theoretically incorrect, in practice it may work.
>> >For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
>> happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
>> that the ORDER BY would be a waste of time.
>> >Untheoretical expediency sometimes influences our judgement.
>> >
>> >Commenting on the original.
>> >Clearly poor clustering (90% is already poor, 95% moderate) will strongly
>> encourage a TS Scan with Sort in preference to Sort Avoidance using the
>> index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
>> fetched. The cost of full scan of a poorly clustered index is far too
much I/O.
>> >
>> >From: Michael Hannan
>> >
>> >>From: Helen Johnson <[login to unmask email]>
>> >>Subject: Re: Clustering affects sorts
>> >>To: [login to unmask email]
>> >>
>> >>I have always told programmers and users to use ORDER BY to be certain of
>> the data order. But what about when the SQL contains a DISTINCT and the
>> columns are listed in the same order as
>> >>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
>> and ORDER BY), wouldn't the ORDER BY sort be redundant?
>> >>
>> >>Helen
>> >>
>> >>[login to unmask email] wrote:
>> >>
>> >>> Hi,
>> >>>
>> >>> Although sorts can be avoided by use of an index (clustered or
>> >>> non-clustered) in the correct sequence, you should ALWAYS include
>> >>> an ORDER BY clause if you want the data in a certain order. DB2
>> >>> could always switch to a different index which may require a sort
>> >>> or return the data in a different order if the ORDER BY is omitted.
>> >>>
>> >>> Regards,
>> >>>
>> >>> Michael Levine
>> >>> Premier Data Services, Inc.
>> >>>
>> >>> >Hi all,
>> >>> > I was going through the article of Bonnie Baker in the latest
edition
>> >>> >of DB2 MAG and found some good ideas and some indepth analysis on
ORDER BY
>> >>> >.
>> >>> > It raised a question in my mind that up to what extent the
>> >>> >clustering of an index affects the avoidance of an ORDER BY clause
except
>> >>> >that LIST PREFETCH will not occur.
>> >>> >
>> >>> >Thanks in advance for any help...
>> >>> >
>> >>> >Sanjeev
>> >>> >
>> >>> >
>> >>> >
>> >>> the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the
list can
>> >>>
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>>
>>
>> >>
>> >>
>> >>
>>
>>
>> >>
>> >
>>
>>
>>


>
>
>


>



[login to unmask email]

Clustering affects sorts
Hi all,
I was going through the article of Bonnie Baker in the latest edition
of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
.
It raised a question in my mind that up to what extent the
clustering of an index affects the avoidance of an ORDER BY clause except
that LIST PREFETCH will not occur.

Thanks in advance for any help...

Sanjeev



[login to unmask email]

Re: Clustering affects sorts
(in response to ssethi@LOT.TATASTEEL.COM)
Hi,

Although sorts can be avoided by use of an index (clustered or
non-clustered) in the correct sequence, you should ALWAYS include
an ORDER BY clause if you want the data in a certain order. DB2
could always switch to a different index which may require a sort
or return the data in a different order if the ORDER BY is omitted.

Regards,

Michael Levine
Premier Data Services, Inc.



>Hi all,
> I was going through the article of Bonnie Baker in the latest edition
>of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
>.
> It raised a question in my mind that up to what extent the
>clustering of an index affects the avoidance of an ORDER BY clause except
>that LIST PREFETCH will not occur.
>
>Thanks in advance for any help...
>
>Sanjeev
>
>
>


>
>



Helen Johnson

Re: Clustering affects sorts
(in response to Mike_Levine@TEKHELP.NET)
I have always told programmers and users to use ORDER BY to be certain of the data order. But what about when the SQL contains a DISTINCT and the columns are listed in the same order as
the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT and ORDER BY), wouldn't the ORDER BY sort be redundant?

Helen

[login to unmask email] wrote:

> Hi,
>
> Although sorts can be avoided by use of an index (clustered or
> non-clustered) in the correct sequence, you should ALWAYS include
> an ORDER BY clause if you want the data in a certain order. DB2
> could always switch to a different index which may require a sort
> or return the data in a different order if the ORDER BY is omitted.
>
> Regards,
>
> Michael Levine
> Premier Data Services, Inc.
>
> >Hi all,
> > I was going through the article of Bonnie Baker in the latest edition
> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
> >.
> > It raised a question in my mind that up to what extent the
> >clustering of an index affects the avoidance of an ORDER BY clause except
> >that LIST PREFETCH will not occur.
> >
> >Thanks in advance for any help...
> >
> >Sanjeev
> >
> >
> >
>
>
> >
> >
>
>
>



[login to unmask email]

Re: Clustering affects sorts
(in response to Helen Johnson)
Hi Helen,

If nothing else, you should leave the ORDER BY in for documentation
purposes. With a SELECT DISTINCT, the desired order may not be obvious
to future programmers who may make modifications to the code.

Regards,

Michael Levine
Premier Data Services, Inc.



>I have always told programmers and users to use ORDER BY to be certain of
the data order. But what about when the SQL contains a DISTINCT and the
columns are listed in the same order as
>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
and ORDER BY), wouldn't the ORDER BY sort be redundant?
>
>Helen
>
>[login to unmask email] wrote:
>
>> Hi,
>>
>> Although sorts can be avoided by use of an index (clustered or
>> non-clustered) in the correct sequence, you should ALWAYS include
>> an ORDER BY clause if you want the data in a certain order. DB2
>> could always switch to a different index which may require a sort
>> or return the data in a different order if the ORDER BY is omitted.
>>
>> Regards,
>>
>> Michael Levine
>> Premier Data Services, Inc.
>>
>> >Hi all,
>> > I was going through the article of Bonnie Baker in the latest edition
>> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
>> >.
>> > It raised a question in my mind that up to what extent the
>> >clustering of an index affects the avoidance of an ORDER BY clause except
>> >that LIST PREFETCH will not occur.
>> >
>> >Thanks in advance for any help...
>> >
>> >Sanjeev
>> >
>> >
>> >
>>
>>
>> >
>> >
>>
>>
>>


>
>
>


>
>



Roger Miller

Re: Clustering affects sorts
(in response to Mike_Levine@TEKHELP.NET)
If we can use one sort for two purposes, we do. The rule is, "If you want
the
rows in order, you must specify ORDER BY." Otherwise you might be lucky
or partially lucky or out of luck.

There is a new DB2 Magazine article by Bonnie Baker, titled "Looking for
a Little Order?" It's excellent and I recommend it.

Roger


Helen Johnson <[login to unmask email]>@RYCI.COM> on 12/16/99 08:06:09 AM

Please respond to DB2 Data Base Discussion List <[login to unmask email]>

Sent by: DB2 Data Base Discussion List <[login to unmask email]>


To: [login to unmask email]
cc:
Subject: Re: Clustering affects sorts



I have always told programmers and users to use ORDER BY to be certain of
the data order. But what about when the SQL contains a DISTINCT and the
columns are listed in the same order as
the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
and ORDER BY), wouldn't the ORDER BY sort be redundant?

Helen

[login to unmask email] wrote:

> Hi,
>
> Although sorts can be avoided by use of an index (clustered or
> non-clustered) in the correct sequence, you should ALWAYS include
> an ORDER BY clause if you want the data in a certain order. DB2
> could always switch to a different index which may require a sort
> or return the data in a different order if the ORDER BY is omitted.
>
> Regards,
>
> Michael Levine
> Premier Data Services, Inc.
>
> >Hi all,
> > I was going through the article of Bonnie Baker in the latest
edition
> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER
BY
> >.
> > It raised a question in my mind that up to what extent the
> >clustering of an index affects the avoidance of an ORDER BY clause
except
> >that LIST PREFETCH will not occur.
> >
> >Thanks in advance for any help...
> >
> >Sanjeev
> >
> >
> >
> the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the list
can
>
> >
> >
>
>
>










Roger Miller

Re: Clustering affects sorts
(in response to Roger Miller)
The degree of clustering affects the costs for access, but if we are using
an index for access, it is ordered by those columns.

I enjoyed Bonnie's article too.
http://www.db2mag.com/winter99/prog.shtml

Roger Miller


[login to unmask email]@RYCI.COM> on 12/15/99 08:07:24 PM

Please respond to DB2 Data Base Discussion List <[login to unmask email]>

Sent by: DB2 Data Base Discussion List <[login to unmask email]>


To: [login to unmask email]
cc:
Subject: Clustering affects sorts



Hi all,
I was going through the article of Bonnie Baker in the latest edition
of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
.
It raised a question in my mind that up to what extent the
clustering of an index affects the avoidance of an ORDER BY clause except
that LIST PREFETCH will not occur.

Thanks in advance for any help...

Sanjeev








[login to unmask email]

Re: Clustering affects sorts
(in response to Roger Miller)
Hi all,
Will DB2 EXPLAIN really show 2 sorts(DISTINCT & ORDER BY).Is not DB2
smart enough to understand that sorting is already done(FOR DISTINCT)
?.Actually my question was not about not coding the ORDER BY clause ,It was
about how clustering affects DB2 not to explicitly sort even if the ORDER
BY clause is coded and columns that are included in the ORDER BY clause is
the part of the index that is used and it is in the same order.
As i understand Indexes are always in order and if it is used for the
access and columns in the order by clause is in the same sequeunce as of
the index then there is no point clustering should affect the sorting and
if the columns in the ORDER BY clause is not in the same order or not even
the part of the index then DB2 have to anyhow do the sorting .
The only thing is that if ORDER BY is coded and columns which are to
be sorted are the part of the index
and clustering is poor then DB2 will try avoiding LIST PREFETCH in this
case and that may cause the additional overhead to DB2.However the columns
which are to be retrieved includes the columns which are not the part of
the index.

Regarding the need of DISTINCT ,if the non-unique index columns are used
for DISTINCT and redundant GROUP BY clause is coded then DB2 may avoid
sorting.

I hope somebody can point out if i am not correct and need some more funda.

Thanks for any suggestions

Sanjeev






Helen Johnson <[login to unmask email]>@RYCI.COM> on 12/16/99 04:06:09 PM

Please respond to DB2 Data Base Discussion List <[login to unmask email]>

Sent by: DB2 Data Base Discussion List <[login to unmask email]>


To: [login to unmask email]
cc:
Subject: Re: Clustering affects sorts


I have always told programmers and users to use ORDER BY to be certain of
the data order. But what about when the SQL contains a DISTINCT and the
columns are listed in the same order as
the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
and ORDER BY), wouldn't the ORDER BY sort be redundant?

Helen

[login to unmask email] wrote:

> Hi,
>
> Although sorts can be avoided by use of an index (clustered or
> non-clustered) in the correct sequence, you should ALWAYS include
> an ORDER BY clause if you want the data in a certain order. DB2
> could always switch to a different index which may require a sort
> or return the data in a different order if the ORDER BY is omitted.
>
> Regards,
>
> Michael Levine
> Premier Data Services, Inc.
>
> >Hi all,
> > I was going through the article of Bonnie Baker in the latest
edition
> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER
BY
> >.
> > It raised a question in my mind that up to what extent the
> >clustering of an index affects the avoidance of an ORDER BY clause
except
> >that LIST PREFETCH will not occur.
> >
> >Thanks in advance for any help...
> >
> >Sanjeev
> >
> >
> >
> the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the list
can
>
> >
> >
>
>
>










Jim (C) Chie

Re: Clustering affects sorts
(in response to Michael Hannan)
I wonder is there a real cost to the second sort, or does DB2 simply call
the sort routine which says "I'm already sorted so leave me alone", but it
still shows as a sort in the explain table.

-----Original Message-----
From: Michael Hannan [mailto:[login to unmask email]
Sent: Saturday, December 17, 1994 8:52 AM
To: [login to unmask email]
Subject: Re: Clustering affects sorts


Helen,

DB2 could potentially do any sort it felt like to eliminate duplicates, but
it just so happens that it sorts on columns in sequence specified, ascending
on all of them.

Note however that non-correlated IN subqueries sort for uniqueness into
descending sequence and only 1 sort used despite 2 sort indicators in the
Explain.

Logically and theoretically ORDER BY is required to guarantee the sequence
that you want.

Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
are unique when a subset of columns forms a unique index.
In this case rows could be in any order at all.
ORDER BY is definitely needed here both theoretically and practically.

DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
in general. I often avoid DISTINCT sorts (on single tables) by trying to
include a full unique key, even though I may wish to use DISTINCT to force
materialisation of an intermediate result set.

Otherwise try to avoid using DISTINCT as far as possible.
I am mostly not keen on the performance of DISTINCTs that cause extra sorts.

So in practice there may be cases where you can guarantee that DB2 will do a
DISTINCT sort and where ORDER BY will make no difference other than worse
performance. So although theoretically incorrect, in practice it may work.
For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
that the ORDER BY would be a waste of time.
Untheoretical expediency sometimes influences our judgement.

Commenting on the original.
Clearly poor clustering (90% is already poor, 95% moderate) will strongly
encourage a TS Scan with Sort in preference to Sort Avoidance using the
index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
fetched. The cost of full scan of a poorly clustered index is far too much
I/O.

From: Michael Hannan

>From: Helen Johnson <[login to unmask email]>
>Subject: Re: Clustering affects sorts
>To: [login to unmask email]
>
>I have always told programmers and users to use ORDER BY to be certain of
the data order. But what about when the SQL contains a DISTINCT and the
columns are listed in the same order as
>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
and ORDER BY), wouldn't the ORDER BY sort be redundant?
>
>Helen
>
>[login to unmask email] wrote:
>
>> Hi,
>>
>> Although sorts can be avoided by use of an index (clustered or
>> non-clustered) in the correct sequence, you should ALWAYS include
>> an ORDER BY clause if you want the data in a certain order. DB2
>> could always switch to a different index which may require a sort
>> or return the data in a different order if the ORDER BY is omitted.
>>
>> Regards,
>>
>> Michael Levine
>> Premier Data Services, Inc.
>>
>> >Hi all,
>> > I was going through the article of Bonnie Baker in the latest
edition
>> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER
BY
>> >.
>> > It raised a question in my mind that up to what extent the
>> >clustering of an index affects the avoidance of an ORDER BY clause
except
>> >that LIST PREFETCH will not occur.
>> >
>> >Thanks in advance for any help...
>> >
>> >Sanjeev
>> >
>> >
>> >
>> the DB2-L webpage at http://www.ryci.com/db2-l. The owners of the list
can
>>
>> >
>> >
>>
>>
>>


>
>
>


>








Helen Johnson

Re: Clustering affects sorts
(in response to Jim (C) Chie)
Michael,

End user queries are where I see a lot of this. Our end users are using QMF and most are using prompted queries. They don't know or understand their data or the implications of what they
do. The user picks SORT so they can order their data but they also also pick 'Keep single copy of each row' which added the DISTINCT. Usually this is superfluous because the columns
selected are already unique. The user just knows he doesn't want to see two copies of the same row. He doesn't know he is generating another sort. Then I get complaints about DB2 being
slow when they are trying to sort their 300,000 rows twice. It would be nice if DB2 recognized that one sort could satisfy both requirements, but EXPLAIN doesn't indicate that.


Michael Hannan wrote:

> Helen,
>
> DB2 could potentially do any sort it felt like to eliminate duplicates, but
> it just so happens that it sorts on columns in sequence specified, ascending
> on all of them.
>
> Note however that non-correlated IN subqueries sort for uniqueness into
> descending sequence and only 1 sort used despite 2 sort indicators in the
> Explain.
>
> Logically and theoretically ORDER BY is required to guarantee the sequence
> that you want.
>
> Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
> are unique when a subset of columns forms a unique index.
> In this case rows could be in any order at all.
> ORDER BY is definitely needed here both theoretically and practically.
>
> DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
> in general. I often avoid DISTINCT sorts (on single tables) by trying to
> include a full unique key, even though I may wish to use DISTINCT to force
> materialisation of an intermediate result set.
>
> Otherwise try to avoid using DISTINCT as far as possible.
> I am mostly not keen on the performance of DISTINCTs that cause extra sorts.
>
> So in practice there may be cases where you can guarantee that DB2 will do a
> DISTINCT sort and where ORDER BY will make no difference other than worse
> performance. So although theoretically incorrect, in practice it may work.
> For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
> happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
> that the ORDER BY would be a waste of time.
> Untheoretical expediency sometimes influences our judgement.
>
> Commenting on the original.
> Clearly poor clustering (90% is already poor, 95% moderate) will strongly
> encourage a TS Scan with Sort in preference to Sort Avoidance using the
> index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
> fetched. The cost of full scan of a poorly clustered index is far too much I/O.
>
> From: Michael Hannan
>
> >From: Helen Johnson <[login to unmask email]>
> >Subject: Re: Clustering affects sorts
> >To: [login to unmask email]
> >
> >I have always told programmers and users to use ORDER BY to be certain of
> the data order. But what about when the SQL contains a DISTINCT and the
> columns are listed in the same order as
> >the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
> and ORDER BY), wouldn't the ORDER BY sort be redundant?
> >
> >Helen
> >
> >[login to unmask email] wrote:
> >
> >> Hi,
> >>
> >> Although sorts can be avoided by use of an index (clustered or
> >> non-clustered) in the correct sequence, you should ALWAYS include
> >> an ORDER BY clause if you want the data in a certain order. DB2
> >> could always switch to a different index which may require a sort
> >> or return the data in a different order if the ORDER BY is omitted.
> >>
> >> Regards,
> >>
> >> Michael Levine
> >> Premier Data Services, Inc.
> >>
> >> >Hi all,
> >> > I was going through the article of Bonnie Baker in the latest edition
> >> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
> >> >.
> >> > It raised a question in my mind that up to what extent the
> >> >clustering of an index affects the avoidance of an ORDER BY clause except
> >> >that LIST PREFETCH will not occur.
> >> >
> >> >Thanks in advance for any help...
> >> >
> >> >Sanjeev
> >> >
> >> >
> >> >
> >>
> >>
> >> >
> >> >
> >>
> >>
> >>
>
>
> >
> >
> >
>
>
> >
>
>
>



Helen Johnson

Re: Clustering affects sorts
(in response to Michael Hannan)
Michael,

In your tests, was the order of the DISTINCT columns different from the order of the ORDER BY columns, or were both the same? Thanks,

Helen

Michael Hannan wrote:

> Helen,
>
> I would like to take back my previous answer.
>
> My tests seem to indicate now that once SELECT DISTINCT is coded and causes
> a DB2 RDS Sort, additional ORDER BY takes no appreciable additional CPU and
> presumeably utilises just one sort to do both DISTINCT and ORDER BY.
>
> My test included a join with Selected columns from both tables and ORDER BY
> columns 2, 3, 1. CPU roughly same with or without ORDER BY.
>
> I previously had not noticed this except in non-correlated IN subqueries
> doing single sort for both Uniqueness and for Order By.
>
> So therefore by all means code both DISTINCT and ORDER BY.
> No need to assume ordering from DISTINCT.
>
> From: Michael Hannan
>
> >Date: Sun, 18 Dec 1994 02:52:10 +1000
> >To: [login to unmask email]
> >From: Michael Hannan <[login to unmask email]>
> >Subject: Re: Clustering affects sorts
> >
> >Helen,
> >
> >DB2 could potentially do any sort it felt like to eliminate duplicates, but
> it just so happens that it sorts on columns in sequence specified, ascending
> on all of them.
> >
> >Note however that non-correlated IN subqueries sort for uniqueness into
> descending sequence and only 1 sort used despite 2 sort indicators in the
> Explain.
> >
> >Logically and theoretically ORDER BY is required to guarantee the sequence
> that you want.
> >
> >Suppose DB2 avoids a sort for uniqueness, due to knowing selected columns
> are unique when a subset of columns forms a unique index.
> >In this case rows could be in any order at all.
> >ORDER BY is definitely needed here both theoretically and practically.
> >
> >DB2 unfortuneately does not seem to avoid 2 sorts for DISTINCT and ORDER BY
> in general. I often avoid DISTINCT sorts (on single tables) by trying to
> include a full unique key, even though I may wish to use DISTINCT to force
> materialisation of an intermediate result set.
> >
> >Otherwise try to avoid using DISTINCT as far as possible.
> >I am mostly not keen on the performance of DISTINCTs that cause extra sorts.
> >
> >So in practice there may be cases where you can guarantee that DB2 will do
> a DISTINCT sort and where ORDER BY will make no difference other than worse
> performance. So although theoretically incorrect, in practice it may work.
> >For an Ad Hoc query, if I had a guaranteed sort for DISTINCT, I would
> happily leave off my extra ORDER BY if happy with DISTINCT sequence knowing
> that the ORDER BY would be a waste of time.
> >Untheoretical expediency sometimes influences our judgement.
> >
> >Commenting on the original.
> >Clearly poor clustering (90% is already poor, 95% moderate) will strongly
> encourage a TS Scan with Sort in preference to Sort Avoidance using the
> index unless OPTIMIZE FOR n ROWS indicates that a smaller subset will be
> fetched. The cost of full scan of a poorly clustered index is far too much I/O.
> >
> >From: Michael Hannan
> >
> >>From: Helen Johnson <[login to unmask email]>
> >>Subject: Re: Clustering affects sorts
> >>To: [login to unmask email]
> >>
> >>I have always told programmers and users to use ORDER BY to be certain of
> the data order. But what about when the SQL contains a DISTINCT and the
> columns are listed in the same order as
> >>the ORDER BY would be listed. If the EXPLAIN shows only 2 sorts (DISTINCT
> and ORDER BY), wouldn't the ORDER BY sort be redundant?
> >>
> >>Helen
> >>
> >>[login to unmask email] wrote:
> >>
> >>> Hi,
> >>>
> >>> Although sorts can be avoided by use of an index (clustered or
> >>> non-clustered) in the correct sequence, you should ALWAYS include
> >>> an ORDER BY clause if you want the data in a certain order. DB2
> >>> could always switch to a different index which may require a sort
> >>> or return the data in a different order if the ORDER BY is omitted.
> >>>
> >>> Regards,
> >>>
> >>> Michael Levine
> >>> Premier Data Services, Inc.
> >>>
> >>> >Hi all,
> >>> > I was going through the article of Bonnie Baker in the latest edition
> >>> >of DB2 MAG and found some good ideas and some indepth analysis on ORDER BY
> >>> >.
> >>> > It raised a question in my mind that up to what extent the
> >>> >clustering of an index affects the avoidance of an ORDER BY clause except
> >>> >that LIST PREFETCH will not occur.
> >>> >
> >>> >Thanks in advance for any help...
> >>> >
> >>> >Sanjeev
> >>> >
> >>> >
> >>> >
> >>>
> >>>
> >>> >
> >>> >
> >>>
> >>>
> >>>
>
>
> >>
> >>
> >>
>
>
> >>
> >
>
>
>