New index produces duplicate in program

[login to unmask email]

New index produces duplicate in program
Hi,
We have a PeopleSoft SQL that was running slow, 6+ hours. The solution
for a slow running SQL, was to create an index that contained all the
fields need by the SQL. The index has three fields, one field to satisfy
the predicate of the SQL (field1) which and two fields to satisfy the data
retrieval of the SQL(field2, field3 making the index satisfy the SQL as an
index only operation). The SQL now runs faster, but returns duplicate
rows. In reviewing the program we find that the cursor is opened and a
loop on a fetch and later in the loop field2 is updated.(changed from a NO
to a YES). Putting displays in the COBOL program we find the fetch is
returning first the original row then the updated row. Remove the updated
field from the index (field2) and the program runs successfully, no
duplicates but longer. Any thought on what is going on in DB2 to produce
the results or why the updated row is returned? The SQL does have a Order
by on field 1.

Thanks in advance
Patrick

Mark Doyle

Re: New index produces duplicate in program
(in response to Patrick_D_Poziwilko@WHIRLPOOL.COM)
The information transmitted is intended only for the person or entity to
which it is addressed and may contain confidential and/or privileged
material. If the reader of this message is not the intended recipient,
you are hereby notified that your access is unauthorized, and any review,
dissemination, distribution or copying of this message including any
attachments is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete the material from any
computer.



Phil Grainger

Re: New index produces duplicate in program
(in response to Mark Doyle)
If the row is updated OUTSIDE the cursor (in other words, NOT doing an
UPDATE WHERE CURRENT OF), then you will see the results you describe. DB2
finds the row the first time (with the "NO" in the field), after the
non-cursor update it will then find the row AGAIN (with "YES" in the field)
as DB2 has moved the index entry further down the index (YES collates higher
than NO).

Unfortunately (for you) this is working as designed - IBM document the fact
that processing a cursor whilst at the same time updating the table OUTSIDE
the cursor can have "unpredictable results".

Suggestions for improvement/correction include NOT doing the update this way
or perhaps having the YES/NO column in the index in DESCENDING order
(assuming that the update only ever happens one way!)


Phil Grainger
Computer Associates
Product Manager, DB2
Tel: +44 (0)161 928 9334
Fax: +44 (0)161 941 3775
Mobile: +44 (0)7970 125 752
[login to unmask email]

-----Original Message-----
From: [login to unmask email]
[mailto:[login to unmask email]
Sent: 06 January 2003 14:29
To: [login to unmask email]
Subject: [DB2-L] New index produces duplicate in program



Hi,
We have a PeopleSoft SQL that was running slow, 6+ hours. The solution for
a slow running SQL, was to create an index that contained all the fields
need by the SQL. The index has three fields, one field to satisfy the
predicate of the SQL (field1) which and two fields to satisfy the data
retrieval of the SQL(field2, field3 making the index satisfy the SQL as an
index only operation). The SQL now runs faster, but returns duplicate
rows. In reviewing the program we find that the cursor is opened and a loop
on a fetch and later in the loop field2 is updated.(changed from a NO to a
YES). Putting displays in the COBOL program we find the fetch is returning
first the original row then the updated row. Remove the updated field from
the index (field2) and the program runs successfully, no duplicates but
longer. Any thought on what is going on in DB2 to produce the results or
why the updated row is returned? The SQL does have a Order by on field 1.

Thanks in advance
Patrick

Bernd Oppolzer

Re: New index produces duplicate in program
(in response to Phil Grainger)
The problem is: if you do an UPDATE which changes a column in the index to a
larger value, the cursor will once again fetch the changed row, because the
entry in the index is moved ahead of the read position of the cursor.
This is due to the fact, that this changed index also supports the
cursor-select.

This problem is known as "halloween problem" (well, not exactly, but the
halloween problem is similar) and first appeared in the design phase of System
R, in the late 70's. It was discovered the day before halloween, if I recall it
right, and therefore was called "halloween problem".

I'll give you an example:

update emp
set salary = salary * 1.1
where salary < 20000;

If this UPDATE would use the salary index and it would find for example a
salary of 10000, this salary would be updated to 11000 and deleted/inserted in
the index. So some time later the row will be retrieved again and changed to
12100 and so on. This way, all salaries will be >= 20000 in the end.

The problem was solved this way: DB2 never uses an index on the updated
columns in an UPDATE or in a CURSOR FOR UPDATE. So, this kind of problem can
only appear, if you do normal (keyed) updates while working on a read/only
cursor.

System R is the grandfather/grandmother of DB2. I learned about this thru some
discussions about the history of DB2 on this list. There are some sites which
cover the history of DB2, you will find them with google or so. I don't recall
the URLs.

Regards

Bernd



Am Mon, 06 Jan 2003 schrieben Sie:
>
>
> Hi,
> We have a PeopleSoft SQL that was running slow, 6+ hours. The solution for
> a slow running SQL, was to create an index that contained all the fields
> need by the SQL. The index has three fields, one field to satisfy the
> predicate of the SQL (field1) which and two fields to satisfy the data
> retrieval of the SQL(field2, field3 making the index satisfy the SQL as an
> index only operation). The SQL now runs faster, but returns duplicate
> rows. In reviewing the program we find that the cursor is opened and a loop
> on a fetch and later in the loop field2 is updated.(changed from a NO to a
> YES). Putting displays in the COBOL program we find the fetch is returning
> first the original row then the updated row. Remove the updated field from
> the index (field2) and the program runs successfully, no duplicates but
> longer. Any thought on what is going on in DB2 to produce the results or
> why the updated row is returned? The SQL does have a Order by on field 1.
>
> Thanks in advance
> Patrick
>
>



Terry Purcell

Re: New index produces duplicate in program
(in response to Bernd Oppolzer)
A monir clarification in Bernd's excellent reply.....

Index usage is possible while still avoiding the Halloween Update problem,
but only if list prefetch or multi-index access is chosen. So you will not
see index usage without list prefetch if the index columns (of the select
index) are being updated and you do not have equals predicates on all index
columns.

But as suggested, you have created your own Halloween update problem. And
there is not much that DB2 can do to overcome application introduced logic
or performance problems.

Bonnie Baker wrote an excellent article on this program logic issue recently
in the DB2 magazine. The only point I saw missing was that if you put all of
the work into DB2 (by combining the UPDATE and SELECT into one), then this
logic problem was resolved many years ago by IBM (as Bernd pointed out). So
why reinvent the wheel??

Regards
Terry Purcell
Yevich Lawson & Assoc Inc (YL&A)
IBM Gold Consultant - DB2
IBM Certified Solutions Expert
DB2 V7 Database Administration OS/390
http://www.ylassoc.com


-----Original Message-----
From: DB2 Data Base Discussion List
[mailto:[login to unmask email]On Behalf Of Bernd Oppolzer
Sent: Monday, January 06, 2003 1:18 PM
To: [login to unmask email]
Subject: Re: New index produces duplicate in program


The problem is: if you do an UPDATE which changes a column in the index to a
larger value, the cursor will once again fetch the changed row, because the
entry in the index is moved ahead of the read position of the cursor.
This is due to the fact, that this changed index also supports the
cursor-select.

This problem is known as "halloween problem" (well, not exactly, but the
halloween problem is similar) and first appeared in the design phase of
System
R, in the late 70's. It was discovered the day before halloween, if I recall
it
right, and therefore was called "halloween problem".

I'll give you an example:

update emp
set salary = salary * 1.1
where salary < 20000;

If this UPDATE would use the salary index and it would find for example a
salary of 10000, this salary would be updated to 11000 and deleted/inserted
in
the index. So some time later the row will be retrieved again and changed to
12100 and so on. This way, all salaries will be >= 20000 in the end.

The problem was solved this way: DB2 never uses an index on the updated
columns in an UPDATE or in a CURSOR FOR UPDATE. So, this kind of problem can
only appear, if you do normal (keyed) updates while working on a read/only
cursor.

System R is the grandfather/grandmother of DB2. I learned about this thru
some
discussions about the history of DB2 on this list. There are some sites
which
cover the history of DB2, you will find them with google or so. I don't
recall
the URLs.

Regards

Bernd



Am Mon, 06 Jan 2003 schrieben Sie:
>
>
> Hi,
> We have a PeopleSoft SQL that was running slow, 6+ hours. The solution
for
> a slow running SQL, was to create an index that contained all the fields
> need by the SQL. The index has three fields, one field to satisfy the
> predicate of the SQL (field1) which and two fields to satisfy the data
> retrieval of the SQL(field2, field3 making the index satisfy the SQL as an
> index only operation). The SQL now runs faster, but returns duplicate
> rows. In reviewing the program we find that the cursor is opened and a
loop
> on a fetch and later in the loop field2 is updated.(changed from a NO to a
> YES). Putting displays in the COBOL program we find the fetch is
returning
> first the original row then the updated row. Remove the updated field
from
> the index (field2) and the program runs successfully, no duplicates but
> longer. Any thought on what is going on in DB2 to produce the results or
> why the updated row is returned? The SQL does have a Order by on field 1.
>
> Thanks in advance
> Patrick
>
>