DB2 11 CM question

David Joyce

DB2 11 CM question

Yesterday I upgraded our z/OS DB2 environments to DB2 11 CM.  After I bring up one subsystem and hit the enter key every second, approximately every 45-60 seconds the system clocks for 5-6 seconds and the CPU usage of DSN1MSTR jumps to 70-80%, after that everything returns to normal.  Has anyone encountered this?  If I bring up another DB2 11 CM subsystem the 5-6 seconds the system clocks basically doubles.  Thanks for any help, I am kind of stumped right now.    

Venkat Srinivasan

RE: DB2 11 CM question
(in response to David Joyce)

I haven't seen this type of issue. From the interval you mentioned, a wild guess is that it may have to do with statistics collection. Take a dump and work with a PMR. If you can look at ipcs, systrace may show some pointers.

In Reply to David Joyce:

Yesterday I upgraded our z/OS DB2 environments to DB2 11 CM.  After I bring up one subsystem and hit the enter key every second, approximately every 45-60 seconds the system clocks for 5-6 seconds and the CPU usage of DSN1MSTR jumps to 70-80%, after that everything returns to normal.  Has anyone encountered this?  If I bring up another DB2 11 CM subsystem the 5-6 seconds the system clocks basically doubles.  Thanks for any help, I am kind of stumped right now.    

David Joyce

RE: DB2 11 CM question
(in response to Venkat Srinivasan)

Venkat, I thought of that too but I have the STATIME set to 15 minutes although that only applies to a few IFCIDs.  How would I dump DB2 at the exact second(s) that this issue occurs?  Thanks.

Edited By:
David Joyce[Organization Members] @ Mar 07, 2017 - 07:17 PM (America/Mountain)
David Joyce[Organization Members] @ Mar 07, 2017 - 07:38 PM (America/Mountain)

Olle Brostrom

DB2 11 CM question
(in response to David Joyce)
David,
Since DB1 10 DB2 always uses a 1-minute interval for certain system statistics.

Best Regards

Olle Broström

From: David Joyce [mailto:[login to unmask email]
Sent: den 8 mars 2017 03:11
To: [login to unmask email]
Subject: [DB2-L] - RE: DB2 11 CM question


Venkat, I thought of that too but I have the STATIME set to 15 minutes. Thanks.

-----End Original Message-----

David Joyce

RE: DB2 11 CM question
(in response to Olle Brostrom)

Found a temporary solution, changed SMFACCT and SMFSTAT to NO.  I think this might have to do with how our SMF datasets are defined.  Checking into it.

Venkat Srinivasan

RE: DB2 11 CM question
(in response to David Joyce)

Since you found the problem, by eyeballing, you can setup a parmlib member ieadmcxx to facilitate quick dump capture from the sdsf panel as soon as you spot the spike.

As you seem to say disabling stats gets you out of this situation my second guess is around zosmetrics. Do you have yes coded on the zparm. If so what sdsnlink are you using. Can you try disabling that as opposed to disabling stats altogether. You would still need a diag dump.

Setup a Parmlib member in parmlib conatenation pds(say ieadmcdb)

TITLE=('OPER RQSTD DUMP OF DB2')
JOBNAME=(XCFAS,SSIDMSTR,SSIDDBM1,SSIDIRLM,SSIDDIST),
SDATA=(XESDATA,COUPLE,PSA,LPA,RGN,CSA,SUM,TRT),
DSPNAME=('SSIDIRLM'.*,'XCFAS'.*),
END

/dump parmlib=db


xcfas and the dataspace arent reqd unless datasharing.

You can also set a symbolic for &DB2. as in
TITLE=('OPER RQSTD DUMP OF DB2')
JOBNAME=(XCFAS,&DB2.MSTR,&DB2.DBM1,&DB2.IRLM,&DB2.DIST),
SDATA=(XESDATA,COUPLE,PSA,LPA,RGN,CSA,SUM,TRT),
DSPNAME=('&DB2.IRLM'.*,'XCFAS'.*),
END

Then /dump parmlib=db,symdef=(&db2.=ssid)

I haven't validated the syntax. You need to test this in your sandbox system.

Alternatively you can do the same thing with the Rexx sdsf interface a sample is shown below. There is nothing fancy here, you invoke sdsf every initial delay (say 5 secs ) interval and soon as cpurate gets higher than say 10% it would enter a retry delay (say 2 seconds) and if the cpu rate is still higher than the threshold in every retry attempts( say 3 times) it would attempt the dump command assuming you setup the parmlib member and code the command. You can go thru the rexx and modify as you feel fit.

 

 

/*Rexx*/
Trace 0
Numeric Digits 64
rtncd = 0
RC    = 0
cmd.0 = 1
/********************/
cpulimit = 0.00
cmd.1 = "D D,T"       /*Change this command as reqd */
monitoredsystem = "SSIDMSTR"  /*monitored mstr aspace */
initdelay  = 3        /*Init  delay in seconds */
retrydelay = 3        /*Retry delay in seconds */
retrylimit = 3        /*Try count */
/********************/
connected = "N"
done     = "N"
rc    = isfcalls("ON")
If    rc  \= 0 Then Do
      Say "Batch Rexx-SDSF cannot be initialized"
      rtncd = rc
      Signal Exit_Exec
End
connected = "Y"
retry     = 0
delay = initdelay

/*For every initdelay interval invoke sdsf to get MSTR cpu rate
  until loop is done                                            */
Do Until done = "Y"
    Call Invoke_SDSF
    Call Put_Me_To_Sleep(delay)
End

Exit_Exec:
If connected = "Y" Then Do
      rc = isfcalls("OFF")
End
Exit rtncd

Invoke_SDSF:
If done = "Y" then Return
isfowner = "*"
isfprefix = ""monitoredsystem""
isfcols   = "JNAME CPUPR"
Address SDSF ISFEXEC DA ALL
If    RC  \= 0 Then Do
      Say "ISF Call Returned Bad RC "RC""
      rtncd = RC
      Signal Exit_Exec
End
Do i = 1 To JNAME.0
    Say " "
    Say "Jobname " JNAME.i
    Say  "CPU%   " CPUPR.i
    Say " "
    cpurate = CPUPR.i
    If cpurate >= cpulimit Then Do    /*cpurate >= cpulimit ? */
        If retry >= retrylimit Then Do  /*Retry limit reached ? */
           Call Do_Command             /*Yes, invoke command */
           done = "Y"                  /*Done with the loop */
           delay = 0                   /*Reset delay */
        End
        Else Do                       /*retry limit not reached */
             retry = retry +1         /*Increment count */
             delay = retrydelay /*Set retry delay */
        End
    End
    Else Do
         retry = 0            /*Reset retry count */
    End
End
Return

Put_Me_To_Sleep:
Arg duration            /*Duration in seconds */
If duration = 0 Then Return
Call Syscalls('ON')
   Address SYSCALL
   Sleep duration
Call Syscalls('OFF')
Return


Do_Command:             /*Issue a slash command (MVS) */
Address SDSF "ISFSLASH ("cmd.") (WAIT"

Do i = 1 TO ISFULOG.0      /*Display o/p if any */
    Say Strip(ISFULOG.i)
End
Return

Hopefully you can get the diag dump.
 
In Reply to David Joyce:

Venkat, I thought of that too but I have the STATIME set to 15 minutes although that only applies to a few IFCIDs.  How would I dump DB2 at the exact second(s) that this issue occurs?  Thanks.