Trouble shooting high CPU on DBM1

Tom Glaser

Trouble shooting high CPU on DBM1
Hi,

I'm sure this question has been addressed over the years, so I apologize
upfront for repeating it; but I couldn't find what I needed searching
through the archives.

We have multiple subsystems that are experiencing high CPU times in DBM1.
We have Strobe, Tmon/db2, DB2 PM (DB2 PE) and Query Monitor. I think we
have enough tools to help us pinpoint the problem, just not sure what to
look for.

Following is from Strobe:

** MEASUREMENT SESSION DATA **
------- JOB ENVIRONMENT -------- ----- MEASUREMENT STATISTICS ----
PROGRAM MEASURED - DSNYASCP CPS TIME PERCENT - 0.08
JOB NAME - DBN6DBM1 WAIT TIME PERCENT - 99.92
JOB NUMBER - STC07319 RUN MARGIN OF ERROR PCT - .98
STEP NAME - IEFPROC CPU MARGIN OF ERROR PCT - 34.65
DATE OF SESSION - 11/30/2004 TOTAL SAMPLES TAKEN - 10,000
TIME OF SESSION - 09:05:14 TOTAL SAMPLES PROCESSED - 10,000
INITIAL SAMPLING RATE- 166.67/SEC
FINAL SAMPLING RATE - 166.67/SEC
SYSTEM - z/OS 01.05.00
DFSMS - 1.5.0 SESSION TIME - 0 MIN 50.85 SEC
CPU MODEL - 2064-1C7 CPU TIME - 0 MIN 0.03 SEC
SYSTEM ID - CN00 WAIT TIME - 0 MIN 37.47 SEC
LPAR - CN00 STRETCH TIME - 0 MIN 13.35 SEC
64-BIT ARCHITECTURE ENABLED
SRB TIME - 0 MIN 32.21 SEC
REGION SIZE BELOW 16M - 7,472K SERVICE UNITS- 302
REGION SIZE ABOVE - 1,426,020K
PAGES IN- 0 OUT- 0
PTF LVL- 3.01.FS005378/FS005433 PAGING RATE - 0.00/SEC
EXCPS - 0 0.00/SEC

The sample time was only 60 seconds, or thereabouts, so 32 seconds of SRB
is actually quite high. That's more than 50% of an engine, in this case a
200 MIP engine, or more than 100 MIPS in SRB time, for this interval.
6 seconds (is our normal time) is 10% of the 60 second sample, 10% of 200
MIPS, or about 20, which is about average for this subsystem.

I'm trying to figure out what's causing this problem and not sure where to
start? Can anyone provide some "rules of thumb" or guidance on where to
start trouble shooting this problem.

Thanks,

Tom Glaser
DB2 Systems Support
SBC Communications
[login to unmask email]

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Joel Goldstein

Re: Trouble shooting high CPU on DBM1
(in response to Tom Glaser)
Tom,
There isn't much detail in the strobe output you provided.
All prefetch is charged as SRB in this address space. So if your system
is doing large amounts of scanning, this might point you at "at least" part
of the cause.
Seeing which modules/csects were consuming the CPU/SRB might help point to a
cause too.
Regards,
Joel

----- Original Message -----
From: "Tom Glaser" <[login to unmask email]>
To: <[login to unmask email]>
Sent: Wednesday, December 01, 2004 2:44 PM
Subject: Trouble shooting high CPU on DBM1


> ---------------------- Information from the mail
> header -----------------------
> Sender: DB2 Data Base Discussion List <[login to unmask email]>
> Poster: Tom Glaser <[login to unmask email]>
> Subject: Trouble shooting high CPU on DBM1
> -------------------------------------------------------------------------------
>
> Hi,
>
> I'm sure this question has been addressed over the years, so I apologize
> upfront for repeating it; but I couldn't find what I needed searching
> through the archives.
>
> We have multiple subsystems that are experiencing high CPU times in DBM1.
> We have Strobe, Tmon/db2, DB2 PM (DB2 PE) and Query Monitor. I think we
> have enough tools to help us pinpoint the problem, just not sure what to
> look for.
>
> Following is from Strobe:
>
> ** MEASUREMENT SESSION DATA **
> ------- JOB ENVIRONMENT -------- ----- MEASUREMENT STATISTICS ----
> PROGRAM MEASURED - DSNYASCP CPS TIME PERCENT - 0.08
> JOB NAME - DBN6DBM1 WAIT TIME PERCENT - 99.92
> JOB NUMBER - STC07319 RUN MARGIN OF ERROR PCT - .98
> STEP NAME - IEFPROC CPU MARGIN OF ERROR PCT - 34.65
> DATE OF SESSION - 11/30/2004 TOTAL SAMPLES TAKEN - 10,000
> TIME OF SESSION - 09:05:14 TOTAL SAMPLES PROCESSED - 10,000
> INITIAL SAMPLING RATE- 166.67/SEC
> FINAL SAMPLING RATE - 166.67/SEC
> SYSTEM - z/OS 01.05.00
> DFSMS - 1.5.0 SESSION TIME - 0 MIN 50.85 SEC
> CPU MODEL - 2064-1C7 CPU TIME - 0 MIN 0.03 SEC
> SYSTEM ID - CN00 WAIT TIME - 0 MIN 37.47 SEC
> LPAR - CN00 STRETCH TIME - 0 MIN 13.35 SEC
> 64-BIT ARCHITECTURE ENABLED
> SRB TIME - 0 MIN 32.21 SEC
> REGION SIZE BELOW 16M - 7,472K SERVICE UNITS- 302
> REGION SIZE ABOVE - 1,426,020K
> PAGES IN- 0 OUT- 0
> PTF LVL- 3.01.FS005378/FS005433 PAGING RATE - 0.00/SEC
> EXCPS - 0 0.00/SEC
>
> The sample time was only 60 seconds, or thereabouts, so 32 seconds of SRB
> is actually quite high. That's more than 50% of an engine, in this case a
> 200 MIP engine, or more than 100 MIPS in SRB time, for this interval.
> 6 seconds (is our normal time) is 10% of the 60 second sample, 10% of 200
> MIPS, or about 20, which is about average for this subsystem.
>
> I'm trying to figure out what's causing this problem and not sure where to
> start? Can anyone provide some "rules of thumb" or guidance on where to
> start trouble shooting this problem.
>
> Thanks,
>
> Tom Glaser
> DB2 Systems Support
> SBC Communications
> [login to unmask email]
>
> ---------------------------------------------------------------------------------
> Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
> home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
> select "Join or Leave the list". The IDUG DB2-L FAQ is at
> http://www.idugdb2-l.org. The IDUG List Admins can be reached at
> [login to unmask email] Find out the latest on IDUG conferences
> at http://conferences.idug.org/index.cfm
>
>

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Martin Packer

Re: Trouble shooting high CPU on DBM1
(in response to Joel Goldstein)
In addition to what Joel said...

If DB2 were having to manage its virtual storage extra carefully then it
would clock up CPU time in the DBM1 address space.

Regards, Martin

Martin Packer, MBCS CITP Martin Packer/UK/IBM
020-8832-5167 in the UK (+44) (MOBX 273643, Internal 7-325167, Mobile
07802-245584)

"Borrowing your watch and using it to tell you which way South is"

Blog:
http://[login to unmask email]

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

[login to unmask email] Blake

Re: Trouble shooting high CPU on DBM1
(in response to Martin Packer)
Hi Joel,

We're looking at the CPU in DBM1 too so since I didn't see an answer from
Tom, I'll give you my info. We would appreciate any ideas you have.

DB2V7, ZOS 1.4.
In a 10 minute strobe during a busy time strobe found .024 sec of CPU, 1
minute of SRB, 7 minutes of wait and 2 minutes of stretch. There were
2,215,366 excp's.

The split between CPU and SRB in the strobe appears to be representative.
The job output each week shows about 7 minutes of CPU and 7 hours of SRB
time.

The wait time was in ECA. DSNVEUS3 used most of the .024 sec of CPU. The
most intensively used procedures were SVC 026, Catalog Management; DSNTDC,
DBASE CMD DATA MGR IFACE and DSNBBM, Retrieve requested page.

We're looking at buffer pools and the tablespaces with high percent of run
time on the Strobe. Can you think of anything else we should be looking at?


Thank You

-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
Of Joel Goldstein - Responsive Systems
Sent: Wednesday, December 01, 2004 12:14 PM
To: [login to unmask email]
Subject: Re: Trouble shooting high CPU on DBM1


Tom,
There isn't much detail in the strobe output you provided.
All prefetch is charged as SRB in this address space. So if your system
is doing large amounts of scanning, this might point you at "at least" part
of the cause.
Seeing which modules/csects were consuming the CPU/SRB might help point to a
cause too.
Regards,
Joel

----- Original Message -----
From: "Tom Glaser" <[login to unmask email]>
To: <[login to unmask email]>
Sent: Wednesday, December 01, 2004 2:44 PM
Subject: Trouble shooting high CPU on DBM1


> ---------------------- Information from the mail
> header -----------------------
> Sender: DB2 Data Base Discussion List <[login to unmask email]>
> Poster: Tom Glaser <[login to unmask email]>
> Subject: Trouble shooting high CPU on DBM1
>
----------------------------------------------------------------------------
---
>
> Hi,
>
> I'm sure this question has been addressed over the years, so I apologize
> upfront for repeating it; but I couldn't find what I needed searching
> through the archives.
>
> We have multiple subsystems that are experiencing high CPU times in DBM1.
> We have Strobe, Tmon/db2, DB2 PM (DB2 PE) and Query Monitor. I think we
> have enough tools to help us pinpoint the problem, just not sure what to
> look for.
>
> Following is from Strobe:
>
> ** MEASUREMENT SESSION DATA **
> ------- JOB ENVIRONMENT -------- ----- MEASUREMENT STATISTICS ----
> PROGRAM MEASURED - DSNYASCP CPS TIME PERCENT - 0.08
> JOB NAME - DBN6DBM1 WAIT TIME PERCENT - 99.92
> JOB NUMBER - STC07319 RUN MARGIN OF ERROR PCT - .98
> STEP NAME - IEFPROC CPU MARGIN OF ERROR PCT - 34.65
> DATE OF SESSION - 11/30/2004 TOTAL SAMPLES TAKEN - 10,000
> TIME OF SESSION - 09:05:14 TOTAL SAMPLES PROCESSED - 10,000
> INITIAL SAMPLING RATE- 166.67/SEC
> FINAL SAMPLING RATE - 166.67/SEC
> SYSTEM - z/OS 01.05.00
> DFSMS - 1.5.0 SESSION TIME - 0 MIN 50.85 SEC
> CPU MODEL - 2064-1C7 CPU TIME - 0 MIN 0.03 SEC
> SYSTEM ID - CN00 WAIT TIME - 0 MIN 37.47 SEC
> LPAR - CN00 STRETCH TIME - 0 MIN 13.35 SEC
> 64-BIT ARCHITECTURE ENABLED
> SRB TIME - 0 MIN 32.21 SEC
> REGION SIZE BELOW 16M - 7,472K SERVICE UNITS- 302
> REGION SIZE ABOVE - 1,426,020K
> PAGES IN- 0 OUT- 0
> PTF LVL- 3.01.FS005378/FS005433 PAGING RATE - 0.00/SEC
> EXCPS - 0 0.00/SEC
>
> The sample time was only 60 seconds, or thereabouts, so 32 seconds of SRB
> is actually quite high. That's more than 50% of an engine, in this case a
> 200 MIP engine, or more than 100 MIPS in SRB time, for this interval.
> 6 seconds (is our normal time) is 10% of the 60 second sample, 10% of 200
> MIPS, or about 20, which is about average for this subsystem.
>
> I'm trying to figure out what's causing this problem and not sure where to
> start? Can anyone provide some "rules of thumb" or guidance on where to
> start trouble shooting this problem.
>
> Thanks,
>
> Tom Glaser
> DB2 Systems Support
> SBC Communications
> [login to unmask email]
>
>
----------------------------------------------------------------------------
-----
> Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
> home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
> select "Join or Leave the list". The IDUG DB2-L FAQ is at
> http://www.idugdb2-l.org. The IDUG List Admins can be reached at
> [login to unmask email] Find out the latest on IDUG conferences
> at http://conferences.idug.org/index.cfm
>
>

----------------------------------------------------------------------------
-----
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home
page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select
"Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org.
The IDUG List Admins can be reached at [login to unmask email] Find
out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Joel Goldstein

Re: Trouble shooting high CPU on DBM1
(in response to Betty@HHSDC Blake)
Betty,

Every address space consumes CPU, and this happens in two modes, TCB and
SRB.

Perhaps your designation of CPU is TCB, based on your numbers?

When working normally, most of the total CPU consumed by DBM1 should be SRB.
The functions that are charged as TCB, are: Syslgrng, Open/Close, Space
Mgt.
I don't think Strobe is breaking the CPU usage into TCB vs SRB for its
functions such as Catalog Mgt .
This could be a combination of catalog functions. Do you have datasets
taking multiple extents?

I don't have access to the DB2 detail doc since I'm away at a conference.
DSNBMM sounds like a Strobe generic name for Buffer Manager Functions.
If I remember correctly, DSNB1GET is the module that specifically is
responsible for
"Retrieve Requested page".

I have a chart that illustrates where CPU charges for functions are charged,
and the mode.
This is a pdf file that I can email to you if you wish.

Identifying your objects with high sequential scan is a opportunity for
major performance and CPU reductions.
Objects that may not be too large, and receive frequent scan (SP), may not
show up on any normal reports.
It's only when you look at pool activity over time that they may jump out.
Such as small/medium objects
that are scanned thousands of times, but each individual scan is not
significant.

Regards,
Joel
----- Original Message -----
From: "Blake, [login to unmask email]" <[login to unmask email]>
To: <[login to unmask email]>
Sent: Monday, December 06, 2004 1:26 PM
Subject: Re: Trouble shooting high CPU on DBM1


> ---------------------- Information from the mail
> header -----------------------
> Sender: DB2 Data Base Discussion List <[login to unmask email]>
> Poster: "Blake, [login to unmask email]" <[login to unmask email]>
> Subject: Re: Trouble shooting high CPU on DBM1
> -------------------------------------------------------------------------------
>
> This message is in MIME format. Since your mail reader does not understand
> this format, some or all of this message may not be legible.
>
> ------_=_NextPart_001_01C4DBC1.2A594AC4
> Content-Type: text/plain
>
> Hi Joel,
>
> We're looking at the CPU in DBM1 too so since I didn't see an answer from
> Tom, I'll give you my info. We would appreciate any ideas you have.
>
> DB2V7, ZOS 1.4.
> In a 10 minute strobe during a busy time strobe found .024 sec of CPU, 1
> minute of SRB, 7 minutes of wait and 2 minutes of stretch. There were
> 2,215,366 excp's.
>
> The split between CPU and SRB in the strobe appears to be representative.
> The job output each week shows about 7 minutes of CPU and 7 hours of SRB
> time.
>
> The wait time was in ECA. DSNVEUS3 used most of the .024 sec of CPU. The
> most intensively used procedures were SVC 026, Catalog Management; DSNTDC,
> DBASE CMD DATA MGR IFACE and DSNBBM, Retrieve requested page.
>
> We're looking at buffer pools and the tablespaces with high percent of run
> time on the Strobe. Can you think of anything else we should be looking
> at?
>
>
> Thank You
>
> -----Original Message-----
> From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
> Of Joel Goldstein - Responsive Systems
> Sent: Wednesday, December 01, 2004 12:14 PM
> To: [login to unmask email]
> Subject: Re: Trouble shooting high CPU on DBM1
>
>
> Tom,
> There isn't much detail in the strobe output you provided.
> All prefetch is charged as SRB in this address space. So if your system
> is doing large amounts of scanning, this might point you at "at least"
> part
> of the cause.
> Seeing which modules/csects were consuming the CPU/SRB might help point to
> a
> cause too.
> Regards,
> Joel
>

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm