We are experiencing unusual (at least to me) abends on some reorgs
by
partition. DB2 v6, OS/390 v2.10
We have a growing application with many tablespaces each with many
partitions, no NPIs. We have a standalone window in which to run SHRLEVEL
REFERENCE reorgs by partition. Many reorgs run simultaneously, for
different tablespaces and for multiple partitions within one tablespace. A
small fraction of these reorgs fail in the utilterm phase. This happens
consistently week to week on 2 DB2 subsystems, and it is annoying. Before
submitting this to IBM as a possible bug, I seek the advice of this
esteemed group of experts (how's that for sucking up?). I will break this
into two subtopics. Has anyone else experienced this ? Does anyone have
any advice on either subtopic. ?
Thanks,
Mark McCormack
State Street Corp.
subtopic#1 the reorg failure
---------------
Reorg for a single partition fails. Sysprint msgs are not really helpful.
example:
DSNU017I DSNUGSAT - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED,
REASON=X'00C90206'
DSNU016I DSNUGBAC - UTILITY BATCH MEMORY EXECUTION ABENDED,
REASON=X'00E40347'
-DIS UTILITY yields :
DSNU100I -D2G1 DSNUGDIS - USERID = OPTC
MEMBER =
UTILID = GX70229.REORG
PROCESSING UTILITY STATEMENT 1
UTILITY = REORG
PHASE = UTILTERM COUNT = 0
STATUS = STOPPED
-DIS DB yields :
NAME TYPE PART STATUS
-----------------------------------------------------------------
EDTTXN TS 025 RW,COPY,UTUT
EDTTXN01 IX 025 RW,UTUT
The reorg cntl stmt contains: STATISTICS TABLE(ALL) INDEX(ALL) KEYCARD
When the abend occurs, msgs like the following are appearing on DB2MSTR:
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06 .SYSSTATS.X'00000391'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000303
NAME DSNDB06 .DSNTNX01.X'00000007'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.25 STC31342 DSN3201I -D2G1 ABNORMAL EOT IN PROGRESS FOR USER=OPTC
CONNECTION-ID=UTILITY CORRELATION-ID=GX70229
JOBNAME=GX70229
TCB=009D0810
This would suggest that the problem involves updating sysstats. Since
similar problems have occurred on 2 separate subsystems (no data sharing
connection), I doubt there is anything actually wrong with sysstats. This
past weekend we ran without STATISTICS on the reorg cntl stmts, and we had
no failures. That is not conclusive proof, but it is getting curiouser and
curiouser.
subtopic#2 unexpected result of reorg restart in utilterm phase
---------------
We restart the failed reorg in the utilterm phase. It completes quickly,
rc=4.
Tablespace status is changed to RW,COPY
Index status is changed to RW
The reorg cntl stmt contained: COPYDDN(SYSCOPY)
The sysprint from the failed reorg contained (in msgs from the reload
phase):
COPY PROCESSED FOR TABLESPACE DB702EDR.EDTTXN PART 25
The restarted utilterm phase:
removes UTUT from the objects' status
posts to syscopy the row for 'reorg successful'
does not post to syscopy the row for successful inline copy
leaves the tablespace partition copy pending.
Although the inline copy is created in the reload phase, I know it cannot
be posted to syscopy until after 'reorg successful' is posted to syscopy.
I assumed that both rows would be posted to syscopy in the restarted
utilterm phase. It appears that the inline copy is forgotten when the
original reorg run fails and is restarted. A separate image copy run
clears up copy pending.
Should restarted reorg post the inline copy row to syscopy ? Is this a bug
?
partition. DB2 v6, OS/390 v2.10
We have a growing application with many tablespaces each with many
partitions, no NPIs. We have a standalone window in which to run SHRLEVEL
REFERENCE reorgs by partition. Many reorgs run simultaneously, for
different tablespaces and for multiple partitions within one tablespace. A
small fraction of these reorgs fail in the utilterm phase. This happens
consistently week to week on 2 DB2 subsystems, and it is annoying. Before
submitting this to IBM as a possible bug, I seek the advice of this
esteemed group of experts (how's that for sucking up?). I will break this
into two subtopics. Has anyone else experienced this ? Does anyone have
any advice on either subtopic. ?
Thanks,
Mark McCormack
State Street Corp.
subtopic#1 the reorg failure
---------------
Reorg for a single partition fails. Sysprint msgs are not really helpful.
example:
DSNU017I DSNUGSAT - UTILITY DATA BASE SERVICES MEMORY EXECUTION ABENDED,
REASON=X'00C90206'
DSNU016I DSNUGBAC - UTILITY BATCH MEMORY EXECUTION ABENDED,
REASON=X'00E40347'
-DIS UTILITY yields :
DSNU100I -D2G1 DSNUGDIS - USERID = OPTC
MEMBER =
UTILID = GX70229.REORG
PROCESSING UTILITY STATEMENT 1
UTILITY = REORG
PHASE = UTILTERM COUNT = 0
STATUS = STOPPED
-DIS DB yields :
NAME TYPE PART STATUS
-----------------------------------------------------------------
EDTTXN TS 025 RW,COPY,UTUT
EDTTXN01 IX 025 RW,UTUT
The reorg cntl stmt contains: STATISTICS TABLE(ALL) INDEX(ALL) KEYCARD
When the abend occurs, msgs like the following are appearing on DB2MSTR:
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000302
NAME DSNDB06 .SYSSTATS.X'00000391'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.24 STC31342 DSNI013I -DU12 DSNIDIFS POTENTIALLY INCONSISTENT DATA
REASON 00C90206
ERQUAL 5007
TYPE 00000303
NAME DSNDB06 .DSNTNX01.X'00000007'
CONNECTION-ID=UTILITY
CORRELATION-ID=GX70229
LUW-ID=*
10.03.25 STC31342 DSN3201I -D2G1 ABNORMAL EOT IN PROGRESS FOR USER=OPTC
CONNECTION-ID=UTILITY CORRELATION-ID=GX70229
JOBNAME=GX70229
TCB=009D0810
This would suggest that the problem involves updating sysstats. Since
similar problems have occurred on 2 separate subsystems (no data sharing
connection), I doubt there is anything actually wrong with sysstats. This
past weekend we ran without STATISTICS on the reorg cntl stmts, and we had
no failures. That is not conclusive proof, but it is getting curiouser and
curiouser.
subtopic#2 unexpected result of reorg restart in utilterm phase
---------------
We restart the failed reorg in the utilterm phase. It completes quickly,
rc=4.
Tablespace status is changed to RW,COPY
Index status is changed to RW
The reorg cntl stmt contained: COPYDDN(SYSCOPY)
The sysprint from the failed reorg contained (in msgs from the reload
phase):
COPY PROCESSED FOR TABLESPACE DB702EDR.EDTTXN PART 25
The restarted utilterm phase:
removes UTUT from the objects' status
posts to syscopy the row for 'reorg successful'
does not post to syscopy the row for successful inline copy
leaves the tablespace partition copy pending.
Although the inline copy is created in the reload phase, I know it cannot
be posted to syscopy until after 'reorg successful' is posted to syscopy.
I assumed that both rows would be posted to syscopy in the restarted
utilterm phase. It appears that the inline copy is forgotten when the
original reorg run fails and is restarted. A separate image copy run
clears up copy pending.
Should restarted reorg post the inline copy row to syscopy ? Is this a bug
?