Question on DELETE with subselect, as opposed to a direct DELETE

Kal Sub

Question on DELETE with subselect, as opposed to a direct DELETE

Hi,

I see a delete statement in an application code like this :

DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3 from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days).

Could I not just write this as 

DELETE from Table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days

The access path for the first query shows 3 lines - an index access with matchcols 2 (for the subquery), and then a sort (due to the IN), and then again an index access with matchcols 2 (for the outer delete).

The access path for the second query is just 1 line - an index access with matchcols 2.

I don't know why the statement was coded with a subquery originally. I seem to remember vaguely that the subquery approach was suggested sometimes to avoid lock contention on the table. But I am not sure if that is really right in this situation.

Appreciate your suggestions.

Thanks,

Regards

Kals

Sam Baugh

Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Kal Sub)
You can if the table in the subselect is same as the table being deleted
from. This format is normally used for UPDATE or DELETE when its based on
a different table. Or it could have been cloned by a developer without
considering that the subselect was unnecessary. Since DB2 v11, the column
attributes need to match, otherwise, they can result with a poor access
path.

On Thu, Dec 14, 2017 at 6:18 PM, Kal Sub <[login to unmask email]> wrote:

> Hi,
>
> I see a delete statement in an application code like this :
>
> DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3
> from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7
> days).
>
> Could I not just write this as
>
> DELETE from Table1 where col1 = ? and col2 = ? and col3 < current
> timestamp - 7 days
>
> The access path for the first query shows 3 lines - an index access with
> matchcols 2 (for the subquery), and then a sort (due to the IN), and then
> again an index access with matchcols 2 (for the outer delete).
>
> The access path for the second query is just 1 line - an index access
> with matchcols 2.
>
> I don't know why the statement was coded with a subquery originally. I
> seem to remember vaguely that the subquery approach was suggested sometimes
> to avoid lock contention on the table. But I am not sure if that is really
> right in this situation.
>
> Appreciate your suggestions.
>
> Thanks,
>
> Regards
>
> Kals
>
> -----End Original Message-----
>

Kal Sub

RE: Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Sam Baugh)

Thanks, Sam. Yes, it is the same table. 

Could you please elaborate your comment about the column attributes having to match since v11. ? I don't think I quite got that.

Regards

Kals


In Reply to Sam Baugh:

You can if the table in the subselect is same as the table being deleted
from. This format is normally used for UPDATE or DELETE when its based on
a different table. Or it could have been cloned by a developer without
considering that the subselect was unnecessary. Since DB2 v11, the column
attributes need to match, otherwise, they can result with a poor access
path.

On Thu, Dec 14, 2017 at 6:18 PM, Kal Sub <[login to unmask email]> wrote:

> Hi,
>
> I see a delete statement in an application code like this :
>
> DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3
> from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7
> days).
>
> Could I not just write this as
>
> DELETE from Table1 where col1 = ? and col2 = ? and col3 < current
> timestamp - 7 days
>
> The access path for the first query shows 3 lines - an index access with
> matchcols 2 (for the subquery), and then a sort (due to the IN), and then
> again an index access with matchcols 2 (for the outer delete).
>
> The access path for the second query is just 1 line - an index access
> with matchcols 2.
>
> I don't know why the statement was coded with a subquery originally. I
> seem to remember vaguely that the subquery approach was suggested sometimes
> to avoid lock contention on the table. But I am not sure if that is really
> right in this situation.
>
> Appreciate your suggestions.
>
> Thanks,
>
> Regards
>
> Kals
>
> -----End Original Message-----
>

Michael Hannan

RE: Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Kal Sub)

Kal,

Db2 is probably transforming the IN subquery into a join like access path, and I agree the Sort is for duplicate removal when col1, col2, and col3 do not form a unique key.

Through transitive closure, all the local subquery predicates are implied to apply to the DELETE table as well. So you are quite correct, that the IN subquery has no use at all. Merely degrades the performance. Probably the coder misunderstood some technique and applied it incorrectly. Would not be the first time we ever saw an unnecessary join or subquery. On a slight tangent, in the old days people talked about joining a table to itself in order to overcome Stage 2 nature of a predicate like COL1 = COL2. That really made no sense as well. 

If a subquery is needed in a Delete (not in this example), I don't like to see IN construct unless DB2 can definitely internally rewrite it as a join access path.

The IN subquery with DELETE could be useful to limit the number of keys being deleted in a unit of work (before COMMIT) if the subquery included a FETCH FIRST n ROWS ONLY clause and perhaps a DISTINCT. In DB2 V12 the FETCH FIRST n ROWS ONLY clause can appear directly on the DELETE, probably reducing need for DELETEs with self referencing subqueries.

I am also not sure what Sam was getting at re attributes needing to match since Db2 11. I recall a special comment about matching attributes for pruning parts visited in a DPSI via a join predicate, at V11 (with several DPSI advances), but doubt if that applied in earlier version DPSI pruning due to transitive closure predicates generated from a join (V8). This may have nothing to do with Sam's comment. Just a wild guess.

 
In Reply to Kal Sub:

Hi,

I see a delete statement in an application code like this :

DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3 from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days).

Could I not just write this as 

DELETE from Table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days

The access path for the first query shows 3 lines - an index access with matchcols 2 (for the subquery), and then a sort (due to the IN), and then again an index access with matchcols 2 (for the outer delete).

The access path for the second query is just 1 line - an index access with matchcols 2.

I don't know why the statement was coded with a subquery originally. I seem to remember vaguely that the subquery approach was suggested sometimes to avoid lock contention on the table. But I am not sure if that is really right in this situation.

Appreciate your suggestions.

Thanks,

Regards

Kals

 

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 15, 2017 - 03:05 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 15, 2017 - 03:17 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 15, 2017 - 03:32 AM (Europe/Berlin)

Sam Baugh

Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Kal Sub)
In our case, the table being updated had a CHAR(12) column (last 2
characters are always spaces(purchased system)), and the value from the
subselect was CHAR(10). This did not qualify as matching, but when we
appended the 2 spaces to get to char(12), it started using an index. This
was not an issue on DB2 v10 z/OS.

On Thu, Dec 14, 2017 at 7:33 PM, Kal Sub <[login to unmask email]> wrote:

> Thanks, Sam. Yes, it is the same table.
>
> Could you please elaborate your comment about the column attributes having
> to match since v11. ? I don't think I quite got that.
>
> Regards
>
> Kals
>
>
> In Reply to Sam Baugh:
>
> You can if the table in the subselect is same as the table being deleted
> from. This format is normally used for UPDATE or DELETE when its based on
> a different table. Or it could have been cloned by a developer without
> considering that the subselect was unnecessary. Since DB2 v11, the column
> attributes need to match, otherwise, they can result with a poor access
> path.
>
> On Thu, Dec 14, 2017 at 6:18 PM, Kal Sub wrote:
>
> > Hi,
> >
> > I see a delete statement in an application code like this :
> >
> > DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3
> > from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7
> > days).
> >
> > Could I not just write this as
> >
> > DELETE from Table1 where col1 = ? and col2 = ? and col3 < current
> > timestamp - 7 days
> >
> > The access path for the first query shows 3 lines - an index access with
> > matchcols 2 (for the subquery), and then a sort (due to the IN), and then
> > again an index access with matchcols 2 (for the outer delete).
> >
> > The access path for the second query is just 1 line - an index access
> > with matchcols 2.
> >
> > I don't know why the statement was coded with a subquery originally. I
> > seem to remember vaguely that the subquery approach was suggested
> sometimes
> > to avoid lock contention on the table. But I am not sure if that is
> really
> > right in this situation.
> >
> > Appreciate your suggestions.
> >
> > Thanks,
> >
> > Regards
> >
> > Kals
> >
> > -----End Original Message-----
> >
>
>
> -----End Original Message-----
>

Daniel Luksetich

Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Michael Hannan)
First of all it seems that in this case the subquery is not needed. Now, regarding using a non-correlated versus correlated subqueries is a matter of available indexes and quantity of data. Although it’s a bit dated, this is still a pretty good explanation (page 2):



http://www.enterprisesystemsmedia.com/it-management/db2-for-z-os-sql-performance-choices http://www.enterprisesystemsmedia.com/it-management/db2-for-z-os-sql-performance-choices



Now, regarding row expressions. DB2 is able to match of leading index columns for row expressions, and if not they are processed as stage 2 predicates (my information may be a bit dated, but basically you only want to use them if they are indexable).



Cheers,

Dan



Daniel L Luksetich

DanL Database Consulting



IBM GOLD Consultant

IBM Champion for Analytics

IDUG Content Committee Past-Chairman

IBM Certified Database Adminstrator – DB2 11 DBA for z/OS

IBM Certified System Administrator – DB2 11 for z/OS

IBM Certified Application Developer – DB2 11 for z/OS

IBM Certified Advanced Database Administrator – DB2 10.1 for Linux UNIX and Windows



From: Michael Hannan [mailto:[login to unmask email]
Sent: Thursday, December 14, 2017 8:03 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Question on DELETE with subselect, as opposed to a direct DELETE



Kal,

Db2 is probably transforming the IN subquery into a join like access path, and I agree the Sort is for duplicate removal when col1, col2, and col3 do not form a unique key.

Through transitive closure, all the local subquery predicates are implied to apply to the DELETE table as well. So you are quite correct, that the IN subquery has no use at all. Probably the coder misunderstood some technique and applied it incorrectly. Would not be the first time we ever saw an unnecessary join or subquery. On a slight tangent, in the old days people talked about joining a table to itself in order to overcome Stage 2 nature of a predicate like COL1 = COL2. That really made no sense as well.

If a subquery is needed in a Delete (not in this example), I don't like to see IN construct unless DB2 can definitely internally rewrite it as a join access path. I prefer a Correlated EXISTS subquery if a good index is available.

The IN subquery with DELETE could be useful to limit the number of keys being deleted in a unit of work (before COMMIT) if the subquery included a FETCH FIRST n ROWS ONLY clause and perhaps a DISTINCT. In DB2 V12 the FETCH FIRST n ROWS ONLY clause can appear directly on the DELETE, probably reducing need for DELETEs with self referencing subqueries.


In Reply to Kal Sub:

Hi,

I see a delete statement in an application code like this :

DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3 from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days).

Could I not just write this as

DELETE from Table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days

The access path for the first query shows 3 lines - an index access with matchcols 2 (for the subquery), and then a sort (due to the IN), and then again an index access with matchcols 2 (for the outer delete).

The access path for the second query is just 1 line - an index access with matchcols 2.

I don't know why the statement was coded with a subquery originally. I seem to remember vaguely that the subquery approach was suggested sometimes to avoid lock contention on the table. But I am not sure if that is really right in this situation.

Appreciate your suggestions.

Thanks,

Regards

Kals



Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd



-----End Original Message-----

Michael Hannan

RE: Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Daniel Luksetich)

We have drifted right off the original topic here. We are agreed the subquery was unnecessary.

So this comment at the bottom is more about IN subqueries in general.

In Reply to Daniel Luksetich:

First of all it seems that in this case the subquery is not needed. Now, regarding using a non-correlated versus correlated subqueries is a matter of available indexes and quantity of data. Although it’s a bit dated, this is still a pretty good explanation (page 2):

http://www.enterprisesystemsmedia.com/it-management/db2-for-z-os-sql-performance-choices http://www.enterprisesystemsmedia.com/it-management/db2-for-z-os-sql-performance-choices
Now, regarding row expressions. DB2 is able to match of leading index columns for row expressions, and if not they are processed as stage 2 predicates (my information may be a bit dated, but basically you only want to use them if they are indexable).

I agree that IN (an NOT IN) subqueries are dangerous when they are Stage 2, when not transformable to a join and failing to get Access Type = 'N'. This can happen for more complex subqueries or for all NOT INs. Db2 can transform some NOT INs to correlated NOT EXISTS. In modern Db2 there are lots of transformations involving added correlation or de-correlation. In order not to take risks I prefer to do the transform myself when writing an SQL, rather than risk not satisfying the complex conditions for transformation, not all documented. 

I generally use Correlated NOT EXISTS or EXISTS when a good index is available. I prefer to write joins instead of IN subqueries, where possible (not possible on DELETE or UPDATE) or unless I am certain DB2 will rewrite to a join, or use access type N, the other way to get an index match (where not transformed to a join).

The very old form of Stage 2 IN or the workfile sorted into descending sequence with duplicate removal, is dangerous in some circumstances. Not easy to tune it when performing badly except by SQL change. Joins and Correlated EXISTS subqueries are much easier to tune using indexes. For smaller examples, Db2 in-memory dynamically built Indexes maybe adequate.

That is why (NOT) IN with subquery is my method of last resort, only when the Correlated (NOT) EXISTS is not workable (impossible to index suitably), and the join including outer join) is not workable. IN is my last preference, in the Stage 2 scenario (which was originally the normal scenario). The more complex the subquery, preventing internal transformation, the more dangerous.

I like to get a good access path on my first attempt at writing an SQL, without finding a bad access path IN (and bad performance) and having to rewrite it. We don't always have time to test all the alternative ways, so want a way that is reliable.

Unfortunately throughout the world of application programming, IN and NOT IN tend to be the first preference of many and I have seen some really poor performing examples, fixable only by SQL re-write. (Row-expression) NOT IN (subquery) with or without correlation is a particularly ugly example. I am working on an article that shows an example where the coder used (row-expression) IN (subquery) and it performed 100 times worse than the join and EXISTS variations (that had O.K. indexes). It is so easy to use it unwisely, as a first preference.

I think the article you quoted (with quick read) did not discuss transformations to join, in memory cache assist, and did not cover the NOT variations.  Maybe the best way "depends", but we have to try someday to give clearly understandable advice on how to solve these problems  with reliable performance.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Peter Vanroose

Re: Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Michael Hannan)

In Reply to Michael Hannan:

We have drifted right off the original topic here. We are agreed the subquery was unnecessary.

Actually, in combination with DELETE or UPDATE, a materialised, non-correlated subquery could be a very good idea even when it seems logically & performancewise unnecessary!
Because it will guarantee that the list of elements to be manipulated is built before the table starts to be modified.

A purely index based (and non list-prefetch) access during a delete might change the data on which the WHERE condition is based, while traversing the data. (For example when there are triggers on the table.)


--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

Michael Hannan

Re: Question on DELETE with subselect, as opposed to a direct DELETE
(in response to Peter Vanroose)

Peter,

An interesting thought. Yes we do utilise SQL tricks at times, for a one off. However we don't want things that will behave differently depending on the access path too much, other than avoiding locking problems perhaps. You might have had to use some technique to force materialise the IN List in a work file, if unique keys were selected that did not automatically materialise.

For an UPDATE, I usually make sure my selection criteria will omit all rows already updated, even though I know Db2 will not log any change for a value changed to itself. For DELETE that is not applicable. Already deleted rows cannot re-qualify.

My exception after table LOAD, I have been known to issue a dummy UPDATE that changed a column to same value loaded, merely to fire a Trigger based on Update to that column, since triggers are not fired by the Load.

I think the materialisation technique you mention has limited usefulness.  Maybe for an UPDATE use the subquery to count the number of rows to be updated in a unit of work (before COMMIT), is an application I can think of. I did not understand the scenario in your last paragraph. We can already code FETCH FIRST n ROWS on a DELETE.

Going back to the original question:

"DELETE from Table1 where (col1, col2, col3) in (select col1, col2, col3 from table1 where col1 = ? and col2 = ? and col3 < current timestamp - 7 days).

The access path for the first query shows 3 lines - an index access with matchcols 2 (for the subquery), and then a sort (due to the IN), and then again an index access with matchcols 2 (for the outer delete)."

Peter has made me notice that the DELETE with subquery may lock the data pages subjected to DELETE for less time. The subquery will read the target pages into the bufferpool, then the DELETE part will probably not have to wait for any Sync I/Os while locking data pages updated. I assume an immediately following COMMIT.

This technique would be more applicable when the subquery predicates were on  a non-indexed column requiring a long scan to find a small number of rows to be deleted, and subquery selected Values of Indexed columns to be used for lookup by the DELETE (with a suitable access path). This could allow locks for DELETE to be held for a short time, rather than duration of a full scan.

In Reply to Peter Vanroose:

In Reply to Michael Hannan:

We have drifted right off the original topic here. We are agreed the subquery was unnecessary.

Actually, in combination with DELETE or UPDATE, a materialised, non-correlated subquery could be a very good idea even when it seems logically & performancewise unnecessary!
Because it will guarantee that the list of elements to be manipulated is built before the table starts to be modified.

A purely index based (and non list-prefetch) access during a delete might change the data on which the WHERE condition is based, while traversing the data. (For example when there are triggers on the table.)


--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

 

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 19, 2017 - 12:41 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 19, 2017 - 12:42 AM (Europe/Berlin)