SELECT / UNION question.

John McKown

SELECT / UNION question.
I have a SELECT / UNION statement and I think that I may be doing some
"overkill". The statement looks like:

SELECT DISTINCT *
FROM DB1
UNION
SELECT DISTINCT *
FROM DB2
;

If I read the book correctly, the UNION will eliminate all duplicate rows so
using the DISTINCT on the SELECT is not really necessary. Does the DISTINCT
help or hurt? Both tables are defined identically, even to the column names.
There may well be duplicates within each table and between the tables.

Also as an aside, my defination of a "duplicate row" is one in which all the
columns SELECTed have the same value. Is that correct?

----------------------------------------------------------------------
John McKown
HealthAxis

All opinions are my own and are not the opinions of my employer.

Unsolicited telephone calls from vendors are NOT appreciated and tend to
upset my management.

Divine intervention---What happens when God steps in and does something for
the good of mankind...like ending Ronald Reagan's movie career.



Terry Purcell

Re: SELECT / UNION question.
(in response to John McKown)
John,

It depends (sorry)...

If you select all columns, and have a unique index on each table, then the
DISTINCT is discarded (due to the uniqueness guaranteed by the index).
Change it to select columns that are not within a unique index and the
DISTINCT will perform an additional sort. You specify that there may be
duplicates within each table, so you must not have a unique index, so will
be performing additional sorts.

The UNION takes care of the duplicates, so the DISTINCT is logically
redundant.

However, the DISTINCT (to remove the duplicates) before the UNION (which
removes the final duplicates) may be a benefit. A GROUP BY that is able to
avoid a sort is even better in this situation, than a DISTINCT that requires
a sort.

Explain will show three sorts in this case (with DISTINCTs), whereas without
the DISTINCTs, you will have one sort. As mentioned, this does not always
mean better performance.

As always, test out which SQL performs better for you.

To your final question....yes...a "duplicate row" is one in which all the
columns SELECTed have the same value.

Regards
Terry Purcell
Yevich Lawson & Assoc Inc (YL&A)
IBM Gold Consultant - DB2
IBM Certified Solutions Expert
DB2 V7 Database Administration OS/390
http://www.ylassoc.com


-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email]On Behalf Of
McKown, John
Sent: Friday, December 28, 2001 10:00 AM
To: [login to unmask email]
Subject: SELECT / UNION question.


I have a SELECT / UNION statement and I think that I may be doing some
"overkill". The statement looks like:

SELECT DISTINCT *
FROM DB1
UNION
SELECT DISTINCT *
FROM DB2
;

If I read the book correctly, the UNION will eliminate all duplicate rows so
using the DISTINCT on the SELECT is not really necessary. Does the DISTINCT
help or hurt? Both tables are defined identically, even to the column names.
There may well be duplicates within each table and between the tables.

Also as an aside, my defination of a "duplicate row" is one in which all the
columns SELECTed have the same value. Is that correct?

----------------------------------------------------------------------
John McKown
HealthAxis

All opinions are my own and are not the opinions of my employer.

Unsolicited telephone calls from vendors are NOT appreciated and tend to
upset my management.

Divine intervention---What happens when God steps in and does something for
the good of mankind...like ending Ronald Reagan's movie career.