Unusual abends in DB2 reorg jobs

Mark McCormack

Unusual abends in DB2 reorg jobs
We are experiencing unusual (at least to me) abends on some reorgs by
partition. DB2 v6, OS/390 v2.10

We have a growing application with many tablespaces each with many
partitions, no NPIs. We have a standalone window in which to run SHRLEVEL
REFERENCE reorgs by partition. Many reorgs run simultaneously, for
different tablespaces and for multiple partitions within one tablespace. A
small fraction of these reorgs fail in the utilterm phase. This happens
consistently week to week on 2 DB2 subsystems, and it is annoying. Before
submitting this to IBM as a possible bug, I seek the advice of this
esteemed group of experts (how's that for sucking up?). I will break this
into two subtopics. Has anyone else experienced this ? Does anyone have
any advice on either subtopic. ?

Thanks,
Mark McCormack
State Street Corp.

subtopic#1 the reorg failure
---------------
Reorg for a single partition fails. Sysprint msgs are not really helpful.
example:
DSNU017I DSNUGSAT - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED,
REASON=X'00C90206'
DSNU016I DSNUGBAC - UTILITY BATCH MEMORY EXECUTION ABENDED,
REASON=X'00E40347'

-DIS UTILITY yields :
DSNU100I -D2G1 DSNUGDIS - USERID = OPTC
MEMBER =
UTILID = GX70229.REORG
PROCESSING UTILITY STATEMENT 1
UTILITY = REORG
PHASE = UTILTERM COUNT = 0
STATUS = STOPPED

-DIS DB yields :
NAME TYPE PART STATUS
-----------------------------------------------------------------
EDTTXN TS 025 RW,COPY,UTUT
EDTTXN01 IX 025 RW,UTUT

The reorg cntl stmt contains: STATISTICS TABLE(ALL) INDEX(ALL) KEYCARD

When the abend occurs, msgs like the following are appearing on DB2MSTR:
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06 .SYSSTATS.X'00000391'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000303
NAME DSNDB06 .DSNTNX01.X'00000007'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.25 STC31342 DSN3201I -D2G1 ABNORMAL EOT IN PROGRESS FOR USER=OPTC
CONNECTION-ID=UTILITY CORRELATION-ID=GX70229
JOBNAME=GX70229
TCB=009D0810

This would suggest that the problem involves updating sysstats. Since
similar problems have occurred on 2 separate subsystems (no data sharing
connection), I doubt there is anything actually wrong with sysstats. This
past weekend we ran without STATISTICS on the reorg cntl stmts, and we had
no failures. That is not conclusive proof, but it is getting curiouser and
curiouser.


subtopic#2 unexpected result of reorg restart in utilterm phase
---------------
We restart the failed reorg in the utilterm phase. It completes quickly,
rc=4.
Tablespace status is changed to RW,COPY
Index status is changed to RW

The reorg cntl stmt contained: COPYDDN(SYSCOPY)
The sysprint from the failed reorg contained (in msgs from the reload
phase):
COPY PROCESSED FOR TABLESPACE DB702EDR.EDTTXN PART 25

The restarted utilterm phase:
removes UTUT from the objects' status
posts to syscopy the row for 'reorg successful'
does not post to syscopy the row for successful inline copy
leaves the tablespace partition copy pending.

Although the inline copy is created in the reload phase, I know it cannot
be posted to syscopy until after 'reorg successful' is posted to syscopy.
I assumed that both rows would be posted to syscopy in the restarted
utilterm phase. It appears that the inline copy is forgotten when the
original reorg run fails and is restarted. A separate image copy run
clears up copy pending.

Should restarted reorg post the inline copy row to syscopy ? Is this a bug
?



Max Scarpa

Re: Unusual abends in DB2 reorg jobs
(in response to Mark McCormack)
Hi Mark

Yes it happened (same environment : DB2 V6, OS 390 2.10) sometimes during
SHRLEVEL REFERENCE and at that time it seemed a problem during the
switch/rename phase. What was annoying was the fact that sometimes there
were .S0001 (mirror tablespaces) left and so wasting DASD space (if I
remember well sometimes there were uncatalog fragments)

When we applied (for other reasons) some OS/390 PTFs the problem
(apparently) disappeared. There are some entries in IBM APAR db, see for
instance PQ46811.

HTH

Max Scarpa



Jeremiah Eden

Re: Unusual abends in DB2 reorg jobs
(in response to Max Scarpa)
I don't know your maintenance level but you might look at APAR PQ59770.
There are several current APARs for this abend. Do you have a LOGREC entry
you can paste in here?

APAR PQ59770
When multiple RUNSTATS utilities are executed concurrently in
DB2 data sharing members, ABEND04E RC00C90206 is issued from
DSNIDIFS vrace5007 with following message (msgDSNI013I):
DSNI013I -DBD1 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06.SYSSTATS.X'0000104B'
CONNECTION-ID=UTILITY
CORRELATION-ID=C4CTQ601
LUW-ID=*

-----Original Message-----
From: Mark McCormack [mailto:[login to unmask email]
Sent: Monday, October 14, 2002 2:17 PM
To: [login to unmask email]
Subject: Unusual abends in DB2 reorg jobs


We are experiencing unusual (at least to me) abends on some reorgs by
partition. DB2 v6, OS/390 v2.10

We have a growing application with many tablespaces each with many
partitions, no NPIs. We have a standalone window in which to run SHRLEVEL
REFERENCE reorgs by partition. Many reorgs run simultaneously, for
different tablespaces and for multiple partitions within one tablespace. A
small fraction of these reorgs fail in the utilterm phase. This happens
consistently week to week on 2 DB2 subsystems, and it is annoying. Before
submitting this to IBM as a possible bug, I seek the advice of this
esteemed group of experts (how's that for sucking up?). I will break this
into two subtopics. Has anyone else experienced this ? Does anyone have
any advice on either subtopic. ?

Thanks,
Mark McCormack
State Street Corp.

subtopic#1 the reorg failure
---------------
Reorg for a single partition fails. Sysprint msgs are not really helpful.
example:
DSNU017I DSNUGSAT - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED,
REASON=X'00C90206'
DSNU016I DSNUGBAC - UTILITY BATCH MEMORY EXECUTION ABENDED,
REASON=X'00E40347'

-DIS UTILITY yields :
DSNU100I -D2G1 DSNUGDIS - USERID = OPTC
MEMBER =
UTILID = GX70229.REORG
PROCESSING UTILITY STATEMENT 1
UTILITY = REORG
PHASE = UTILTERM COUNT = 0
STATUS = STOPPED

-DIS DB yields :
NAME TYPE PART STATUS
-----------------------------------------------------------------
EDTTXN TS 025 RW,COPY,UTUT
EDTTXN01 IX 025 RW,UTUT

The reorg cntl stmt contains: STATISTICS TABLE(ALL) INDEX(ALL) KEYCARD

When the abend occurs, msgs like the following are appearing on DB2MSTR:
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06 .SYSSTATS.X'00000391'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000303
NAME DSNDB06 .DSNTNX01.X'00000007'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.25 STC31342 DSN3201I -D2G1 ABNORMAL EOT IN PROGRESS FOR USER=OPTC
CONNECTION-ID=UTILITY CORRELATION-ID=GX70229
JOBNAME=GX70229
TCB=009D0810

This would suggest that the problem involves updating sysstats. Since
similar problems have occurred on 2 separate subsystems (no data sharing
connection), I doubt there is anything actually wrong with sysstats. This
past weekend we ran without STATISTICS on the reorg cntl stmts, and we had
no failures. That is not conclusive proof, but it is getting curiouser and
curiouser.


subtopic#2 unexpected result of reorg restart in utilterm phase
---------------
We restart the failed reorg in the utilterm phase. It completes quickly,
rc=4.
Tablespace status is changed to RW,COPY
Index status is changed to RW

The reorg cntl stmt contained: COPYDDN(SYSCOPY)
The sysprint from the failed reorg contained (in msgs from the reload
phase):
COPY PROCESSED FOR TABLESPACE DB702EDR.EDTTXN PART 25

The restarted utilterm phase:
removes UTUT from the objects' status
posts to syscopy the row for 'reorg successful'
does not post to syscopy the row for successful inline copy
leaves the tablespace partition copy pending.

Although the inline copy is created in the reload phase, I know it cannot
be posted to syscopy until after 'reorg successful' is posted to syscopy.
I assumed that both rows would be posted to syscopy in the restarted
utilterm phase. It appears that the inline copy is forgotten when the
original reorg run fails and is restarted. A separate image copy run
clears up copy pending.

Should restarted reorg post the inline copy row to syscopy ? Is this a bug
?








Jeremiah Eden

Re: Unusual abends in DB2 reorg jobs
(in response to Jeremiah Eden)
I don't know your maintenance level but you might look at APAR PQ59770.
There are several current APARs for this abend. Do you have a LOGREC entry
you can paste in here?

APAR PQ59770
When multiple RUNSTATS utilities are executed concurrently in
DB2 data sharing members, ABEND04E RC00C90206 is issued from
DSNIDIFS vrace5007 with following message (msgDSNI013I):
DSNI013I -DBD1 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06.SYSSTATS.X'0000104B'
CONNECTION-ID=UTILITY
CORRELATION-ID=C4CTQ601
LUW-ID=*

snip


We are experiencing unusual (at least to me) abends on some reorgs by
partition. DB2 v6, OS/390 v2.10


The reorg cntl stmt contains: STATISTICS TABLE(ALL) INDEX(ALL) KEYCARD

When the abend occurs, msgs like the following are appearing on DB2MSTR:
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06 .SYSSTATS.X'00000391'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000303
NAME DSNDB06 .DSNTNX01.X'00000007'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*