Disproportional CPU Hike

Jorg Lueke

Disproportional CPU Hike

This is on a DB2 z/os 11 system. There's a process the most expensive of which is a select on a table with a subselect on the same table which is then unioned. So the table is read four times per input record. The table is small 7 million rows  500 byte records. When the job runs with 1 month of records 2.5 million opens 4.5 million fetches. With 3 months 7.5 million opens 14.5 million fetches. No i/o everything is coming out of the buffers.

 

One month run takes 3 hours of CPU in DB2, three months 30 hours. Is it just a function of 3 times as much data and the table being read four times so 3*4=12 times the expected CPU? Or could it be something else.

The accounting report shows nothing untoward outside of the CPU usage

Michael Hannan

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Jorg,

I would be better placed to give some ideas if you published the SQL and certainly the access path would help.

Also indicate if a rewrite of SQL is practical. considering testing costs etc.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Michael Hannan)

The access path looks like this. The 3rd and 5th row are also prefetch L. The big question is why a tenfold increase of CPU with a threefold increase in data.  The code is provided so it's not a rewrite candidate.

The form of the SQL is just:

 

SELECT FROM TABLE1 WHERE ID IN (SELECT MASTER_ID FROM TABLE1 WHERE VAR = :HV AND VAR2 BETWEEN :HV AND :HV)

UNION

SELECT FROM TABLE1 WHERE ID IN (SELECT OTHER_ID FROM TABLE1 WHERE VAR = :HV AND VAR2 BETWEEN :HV AND :HV)

QBLOCKNO  PLANNO  METHOD   TABNO  METHOD  MATCHCOLS  SORTC_UNIQ  SORTC_JOIN
---------+---------+---------+---------+---------+---------+---------+------
       1       1       3       0       3          0  Y           N         
       2       2       1       1       1          1  N           N         
       2       1       0       2       0          1  N           N         
       4       2       1       3       1          1  N           Y         
       4       1       0       4       0          1  N           N   

alain pary

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Hello, 

do you try to modify your statement by using exists 

SELECT FROM TABLE1 WHERE exists  (SELECT 1 FROM TABLE1 WHERE VAR = :HV AND VAR2 BETWEEN :HV AND :HV   and MASTER_ID = ID)

and index var1,master_id and var2 can be usefull .

 

regards 

 

Alain

Joe Geller

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

My first thought it that tripling the data giving a 10 fold increase in time sounds like a full scan of the inner table of the join (3 * 3).

I assume that VAR is a column of the table?  Since the Method is 1 (NL join) the next question is which is the inner table and which is the outer.  Since Db2 has rewritten the IN to a join, it could be either.  You have matchcols=1, so you could tell which is the inner and outer by which index is used (and it could be a different index for the two parts of the UNION).   In the first part, it could be an index on VAR or it could be an index on MASTER_ID..  What is the cardinality of VAR, MASTER_ID and OTHER_ID? 

If the index used is on a high cardinality column then I would only expect a 3 fold increase in time.  If it is a low cardinality column, then for each row of the outer table, Db2 will have to scan 3 times as much data from the inner.

I notice also that the second part of the UNION has a SORTC_JOIN, but the first part does not.  That suggests (guessing here) that the first part may be using a unique index and the second part a low cardinality index, thus requiring a sort to remove duplicates from the join (to make it equivalent to the IN.

Can you provide us that additional information (indexes used and key column and cardinality)?

Joe

 

In Reply to Jorg Lueke:

The access path looks like this. The 3rd and 5th row are also prefetch L. The big question is why a tenfold increase of CPU with a threefold increase in data.  The code is provided so it's not a rewrite candidate.

The form of the SQL is just:

 

SELECT FROM TABLE1 WHERE ID IN (SELECT MASTER_ID FROM TABLE1 WHERE VAR = :HV AND VAR2 BETWEEN :HV AND :HV)

UNION

SELECT FROM TABLE1 WHERE ID IN (SELECT OTHER_ID FROM TABLE1 WHERE VAR = :HV AND VAR2 BETWEEN :HV AND :HV)

QBLOCKNO  PLANNO  METHOD   TABNO  METHOD  MATCHCOLS  SORTC_UNIQ  SORTC_JOIN
---------+---------+---------+---------+---------+---------+---------+------
       1       1       3       0       3          0  Y           N         
       2       2       1       1       1          1  N           N         
       2       1       0       2       0          1  N           N         
       4       2       1       3       1          1  N           Y         
       4       1       0       4       0          1  N           N   

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Joe Geller)

Hopefully this formatting comes through. It has the SQL on top. Access path wirth degree 1 and any, the current program. It does use parallelism. None of these indexed columns are clustered at all (<10%)

 

 SET CURRENT DEGREE = '1'; 
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
EXPLAIN PLAN SET QUERYNO = 88888 FOR
(SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE
(A.ROOT_MSG_ID IN
(SELECT B.MSG_ID FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID = B.ROOT_MSG_ID
AND B.SCCF_ID BETWEEN ? AND ? )
AND A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL))
UNION
( SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE
(A.MSG_ID IN
(SELECT B.MSG_ID FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID <> B.ROOT_MSG_ID
AND B.SCCF_ID BETWEEN ? AND ?)
AND A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL))
ORDER BY MSG_ID
;
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
EXPLAIN DEGREE 1
QBLOCKNO ACCESSNAME PLANNO METHOD TABNO METHOD MATCHCOLS SORTC_UNIQ SORTC_JOIN PREFETCH ACCESS_DEGREE
1 1 3 0 3 0 Y N -------------
2 CITI942E(ROOT_MSG_ID CARD 4.5M) 2 1 1 1 1 N N -------------
2 CITI942I 1 0 2 0 1 N N L -------------
4 CITI942A(MSG_ID UNIQUE CARD 7M) 2 4 3 4 1 N Y L -------------
4 CITI942I(SCCF_ID) CARD 3.5M 1 0 4 0 1 N N L -------------
SET CURRENT DEGREE = 'ANY'; 00060036
QBLOCKNO ACCESSNAME PLANNO METHOD TABNO METHOD MATCHCOLS SORTC_UNIQ SORTC_JOIN PREFETCH ACCESS_DEGREE
1 1 3 0 3 0 Y N -------------
2 CITI942E 2 1 1 1 1 N N 10
2 CITI942I 1 0 2 0 1 N N L 10
4 CITI942A 2 1 3 1 1 N Y 18
4 CITI942I 1 0 4 0 1 N N L 10
CURRENT PROGRAM
QBLOCKNO ACCESSNAME PLANNO METHOD TABNO METHOD MATCHCOLS SORTC_UNIQ PREFETCH ACCESS_DEGREE
1 1 3 0 3 0 Y N -------------
2 CITI942E 2 1 1 1 1 N N 10
2 CITI942I 1 0 2 0 1 N N 10
4 CITI942A 2 1 3 1 1 N Y 18
4 CITI942I 1 0 4 0 1 N N 10

alain pary

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

hello, 

another proposition 

SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE (EXISTS
(SELECT 1
FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID = B.ROOT_MSG_ID
AND B.SCCF_ID BETWEEN ? AND ?
AND A.ROOT_MSG_ID =B.MSG_ID)
OR EXISTS
(SELECT
FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID <> B.ROOT_MSG_ID
AND B.SCCF_ID BETWEEN ? AND ?
AND A.MSG = B.MSG_ID))
AND A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL
ORDER BY MSG_ID

I hope it helps 

 

Alain

Jorg Lueke

RE: Disproportional CPU Hike
(in response to alain pary)

Thanks Alain

 

i’ll run it through the Explain and see what the optimizer says.

 

Should the parallelism matter?

alain pary

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Hello, 

the best it's to make the same exercices as befor with parallelism 1 or any .

try to create some virtual indexes with the following columns   

MSG_ID 
SCCF_

ROOT_MSG_ID

in this case the exists condition can be index only. 

regards 

 

Alain

Joe Geller

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Jorg,

In the original post you said that there are 2.5 million opens for a month.  Does that mean the query is called 2.5 million times?  If so, what is different for each call?  You have variables for two predicates.  In the subquery you have

B.SCCF_ID BETWEEN ? AND ?

 

In the outer query you have BUS_OWNER_ID=?

what values are passed in for the range?  Db2 is starting with the subquery using the index on SCCF_ID.  ROOT_MSG_ID is not unique.  Does 3 months of data have 3 times as many root_msg_ids or does it have 3 times as many rows for each root_msg_id?  If it is the latter, then this could be the 9 times increase - 3 times as many rows from the subquery joined to 3 times as many rows for a root_msg_id.

However, 3 hours of CPU is awfully high for 1 month with 4.5 million fetches.  How big a range is passed in for SCCF_ID?  If it is a very large range, then for each call DB2 is doing a lot of work.

Joe

 

In Reply to Jorg Lueke:

Hopefully this formatting comes through. It has the SQL on top. Access path wirth degree 1 and any, the current program. It does use parallelism. None of these indexed columns are clustered at all (<10%)

 

 SET CURRENT DEGREE = '1'; 
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
EXPLAIN PLAN SET QUERYNO = 88888 FOR
(SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE
(A.ROOT_MSG_ID IN
(SELECT B.MSG_ID FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID = B.ROOT_MSG_ID
AND B.SCCF_ID BETWEEN ? AND ? )
AND A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL))
UNION
( SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE
(A.MSG_ID IN
(SELECT B.MSG_ID FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID <> B.ROOT_MSG_ID
AND B.SCCF_ID BETWEEN ? AND ?)
AND A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL))
ORDER BY MSG_ID
;
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 0
EXPLAIN DEGREE 1
QBLOCKNO ACCESSNAME PLANNO METHOD TABNO METHOD MATCHCOLS SORTC_UNIQ SORTC_JOIN PREFETCH ACCESS_DEGREE
1 1 3 0 3 0 Y N -------------
2 CITI942E(ROOT_MSG_ID CARD 4.5M) 2 1 1 1 1 N N -------------
2 CITI942I 1 0 2 0 1 N N L -------------
4 CITI942A(MSG_ID UNIQUE CARD 7M) 2 4 3 4 1 N Y L -------------
4 CITI942I(SCCF_ID) CARD 3.5M 1 0 4 0 1 N N L -------------
SET CURRENT DEGREE = 'ANY'; 00060036
QBLOCKNO ACCESSNAME PLANNO METHOD TABNO METHOD MATCHCOLS SORTC_UNIQ SORTC_JOIN PREFETCH ACCESS_DEGREE
1 1 3 0 3 0 Y N -------------
2 CITI942E 2 1 1 1 1 N N 10
2 CITI942I 1 0 2 0 1 N N L 10
4 CITI942A 2 1 3 1 1 N Y 18
4 CITI942I 1 0 4 0 1 N N L 10
CURRENT PROGRAM
QBLOCKNO ACCESSNAME PLANNO METHOD TABNO METHOD MATCHCOLS SORTC_UNIQ PREFETCH ACCESS_DEGREE
1 1 3 0 3 0 Y N -------------
2 CITI942E 2 1 1 1 1 N N 10
2 CITI942I 1 0 2 0 1 N N 10
4 CITI942A 2 1 3 1 1 N Y 18
4 CITI942I 1 0 4 0 1 N N 10

Michael Hannan

RE: Disproportional CPU Hike
(in response to alain pary)

Jorg,

Joe asks some good questions. I think we need to know the Unique key and the cluster index and cardinalities of predicate columns.

Does cursor really get opened 2.5 M times for one hour of data, with different BUS_OWNER_ID values? sorted in sequence or random? I don't think Parallel should be needed. BTW, Parallel path can't do Hybrid Join (not supported). 

Death by random I/O with poorly clustered indexes and not Index Only processing. Some List Prefetch might have helped only slightly.

I think the joins were in wrong sequence with TABNOs 2 and 4 accessed first in the joins. DB2 misunderstands the filtering of range predicates.

Like others I think the EXISTS subquery might be better and can avoid the UNION that way and join in the wrong sequence.

How about this:
First rewrite:

SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL
AND (EXISTS
(SELECT B.MSG_ID FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID = B.ROOT_MSG_ID -- Stage 2 predicate
AND A.ROOT_MSG_ID = B.MSG_ID
AND B.SCCF_ID BETWEEN ? AND ? )

OR EXISTS
(SELECT B.MSG_ID FROM PRD10.IPP_MSG_SEARCH B
WHERE B.MSG_ID <> B.ROOT_MSG_ID
AND A.MSG_ID = B.MSG_ID
AND B.SCCF_ID BETWEEN ? AND ?)
)
ORDER BY MSG_ID
;

2nd rewrite, convert Stage 2 predicate to stage 1, 
and notice there are cases when the original record already satisfies the subquery, so subquery need not be executed.
If A.SCCF_ID satisfies the BETWEEN , then one of the 2 subqueries must find a row that exists, I believe

SELECT *
FROM PRD10.IPP_MSG_SEARCH A
WHERE A.BUS_OWNER_ID = ?
AND A.PARNT_MSG_ID IS NULL

AND (A.SCCF_ID BETWEEN ? AND ?

OR EXISTS
(SELECT 1 FROM PRD10.IPP_MSG_SEARCH B
WHERE A.ROOT_MSG_ID = B.ROOT_MSG_ID -- Indexable predicate
AND A.ROOT_MSG_ID = B.MSG_ID
AND B.SCCF_ID BETWEEN ? AND ? ) 

OR EXISTS
(SELECT 1 FROM PRD10.IPP_MSG_SEARCH B
WHERE A.MSG_ID <> B.ROOT_MSG_ID   -- Not Stage 2
AND A.MSG_ID = B.MSG_ID
AND B.SCCF_ID BETWEEN ? AND ?)

)
ORDER BY MSG_ID
;

Make the following Indexes:
1. for Outer query: PARNT_MSG_ID, BUS_OWNER_ID, MSG_ID, ROOT_MSG_ID
2. for 1st EXISTS: MSG_ID, ROOT_MSG_ID, SCCF_ID "Ix Only" Make it Unique on first column if it is unique.
3. for 2nd EXISTS: MSG_ID, SCCF_ID, ROOT_MSG_ID only needed if MSG_ID not unique. 

If MSG_ID is a Unique key, then i think the last EXISTS subquery becomes redundant and can be omitted.


Michael Hannan,

DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Apr 28, 2019 - 02:15 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Apr 28, 2019 - 02:16 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Apr 28, 2019 - 02:23 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Apr 28, 2019 - 02:27 PM (Europe/Berlin)

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Joe Geller)

Joe

Business owner id has a colcard of 1, so what changes each open of the cursor is the SCCF ID range.

The batch job also never commits, it is read only but I don't know if that impacts the buffers.

I did see that the IX BP was hitting the DMTH. Increasing the BP did cut the CPU cost in half (.7 CPUs to .35 CPUS)\

The performance trace shows this order of events:

Start sort: a few tenths of a second

Index scan: a few miliseconds reads 4 pages

Start second sort: a few tenths of a second

Index scan: a few milliseconds reads in a few qualifying rows

I'm still not convinced that the parallelism is helping either.

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Michael Hannan)

Hi Michael

The application group is very averse to rewriting anything :)

They just want to get this over the finish line without any changes.

Joe Geller

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Jorg,

Hitting DMTH is not good. Robert Catterall has a good example of what can happen (not your case, but illustrates the problem) - http://robertsdb2blog.blogspot.com/2018/05/db2-for-zos-buffer-pools-index-root.html

I don't quite understand your numbers.  Your first post said that 2.5 million opens took 3 hours of CPU.  Was the elapsed time 3 hours (i.e. all CPU time) or was it longer (and if so, how much)?  Each hour is 3600 seconds so 3 hours is 10800 seconds.  That is 4.3 ms per open/fetches.

In your last post, the performance trace showed a few (times 2) tenths of a second.  Is that per open?  Is it elapsed time or cpu time?  Is it for the first open or the average?  Is it after you increased the bufferpool or before?  Is DMTH not reached after the increase?   What makes it unclear is that a few tenths of s second is much more than 4.3 ms.  Also, a few ms to read 4 index pages sounds more like there was at least 1 I/O, so again, not sure of how many I/O you had before and after the bp increase and not clear whether this is the average or just one open.

Joe

In Reply to Jorg Lueke:

Joe

Business owner id has a colcard of 1, so what changes each open of the cursor is the SCCF ID range.

The batch job also never commits, it is read only but I don't know if that impacts the buffers.

I did see that the IX BP was hitting the DMTH. Increasing the BP did cut the CPU cost in half (.7 CPUs to .35 CPUS)\

The performance trace shows this order of events:

Start sort: a few tenths of a second

Index scan: a few miliseconds reads 4 pages

Start second sort: a few tenths of a second

Index scan: a few milliseconds reads in a few qualifying rows

I'm still not convinced that the parallelism is helping either.

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Joe Geller)

Joe

This is the package accounting level info for 1 month 4/6 versus 3 months 4/21 4.23. Elapsed and CPU time are always almost identical from the package with the issues

 

DATE         CLASS7_ELAPSED  CLASS7_CPU_AGENT          SQL 
2019-04-21    137201.354016      98050.358776      1167287 
2019-04-23    131247.169480      98706.486882      1167419 
2019-04-06      8632.842356       7751.224958       385651 
2019-04-06      8622.252763       7642.846307       386870 

This is the full accouting data showing the 2.5 mil versus 7.5 mil opens for the process

 

        ELAPSED              CPU          CPUPARA            DB_IO         OPEN
  141434.295930     99502.479572       752.091430      1779.540662      7428040
  135918.252677    100178.783342       810.970154      2421.306680      7427985
   10134.699711      8180.157862       320.663925       896.791930      2450750
    9837.329273      8052.358248       264.561706       581.314926      2457451

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Here's what a single execution of the problem package looks like with multiple months data

 

Event Time Event Type                Event Resource Information
13:32:45.728 START OPEN CURSOR     PGM=ITDB22S1 STMT=00892         
13:32:45.728 START OF SORT                                          
13:32:46.178 START INDEX SCAN      DB=CITD09P0  PS=CITI942I        
13:32:46.206 END OF INDEX SCAN     DB=CITD09P0  PS=CITI942I        
                                  ROWS=0000000 PAGES=0000004      
13:32:46.206 START OF SORT                                          
13:32:46.618 START INDEX SCAN      DB=CITD09P0  PS=CITI942I        
13:32:46.643 END OF INDEX SCAN     DB=CITD09P0  PS=CITI942I           
                                  ROWS=0000000 PAGES=0000004         
13:32:46.643 END SORT              RECS=0000000000000000 RSIZE=0000064
13:32:46.643 END SORT              RECS=0000000000000000 RSIZE=0001044
13:32:46.643 END OPEN CURSOR       ROWS=0000000 PAGES=0000008      

Joel Goldstein

Disproportional CPU Hike
(in response to Jorg Lueke)
If the pool is hitting DMTH, then VDWQT is too high. Maybe DWQT also.



If there are many objects in the pool, I usually use (0,256) as the setting for vdwqt, which keeps a trickle write going,



What is the IO rate/sec for that pool, and what’s the avg IO elapsed times by type of IO?





Joel Goldstein
Responsive Systems
Buffer Pool Tool(R) for DB2, the worldwide industry standard

Predicts the IO rate/Sec for tuning changes
Performance software that works......
Predicts Group Buffer Pool performance too!
www.responsivesystems.com http://www.responsivesystems.com
tel. (732) 972-1261
fax.(732) 972-9416

[login to unmask email]



From: Jorg Lueke [mailto:[login to unmask email]
Sent: Tuesday, April 30, 2019 8:50 AM
To: [login to unmask email]
Subject: [DB2-L] - RE: Disproportional CPU Hike



Joe

Business owner id has a colcard of 1, so what changes each open of the cursor is the SCCF ID range.

The batch job also never commits, it is read only but I don't know if that impacts the buffers.

I did see that the IX BP was hitting the DMTH. Increasing the BP did cut the CPU cost in half (.7 CPUs to .35 CPUS)\

The performance trace shows this order of events:

Start sort: a few tenths of a second

Index scan: a few miliseconds reads 4 pages

Start second sort: a few tenths of a second

Index scan: a few milliseconds reads in a few qualifying rows

I'm still not convinced that the parallelism is helping either.



-----End Original Message-----

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Joel Goldstein)

Here are some BP stats

Getpages per Sync I/O . :      9.81    Pages Written per Write I/O :    1.04
Prefetch per I/O . . . .:      1.20    Pages Read per Prefetch . . :   25.07
Seq Prefetch per I/O . .:      1.36    Pages Read per Seq Prefetch :   40.31
List Prefetch per I/O . :      1.02    Pages Read per List Prefetch:   24.18
Dyn Prefetch per I/O . .:      1.21    Pages Read per Dyn Prefetch :   25.13
Max Concur Prefetch . . :        36    Workfile Maximum . . . . . .:       0
BP Hit % - Random . . . :     88.1%    Virtual Page Steal Method . :     LRU
BP Hit % - Sequential . :     10.0%                                        
 
              DATA SET #:    1 USE COUNT:     1                 
DSNB453I  <DB2P   VP CACHED PAGES -                              
               CURRENT       =  5058   MAX             =  5058  
               CHANGED       =     0   MAX             =     0  
DSNB455I  <DB2P   SYNCHRONOUS I/O DELAYS -                      
               AVERAGE DELAY =     3   MAXIMUM DELAY   =    59  
               TOTAL PAGES   =   892                            
DSNB467I  <DB2P  STATISTICS FOR INDEX SPACE CITD09P0.CITI942I   
INSTANCE 1 -                                                    
              DATA SET #:    1 USE COUNT:     1                 
DSNB453I  <DB2P   VP CACHED PAGES -                             
               CURRENT       = 13077   MAX             = 13077  
               CHANGED       =     0   MAX             =     0  
DSNB455I  <DB2P   SYNCHRONOUS I/O DELAYS -                      
               AVERAGE DELAY =     2   MAXIMUM DELAY   =    18  
               TOTAL PAGES   =   297                            
DSNB467I  <DB2P  STATISTICS FOR INDEX SPACE CITD09P0.CITI942A   

Joel Goldstein

Disproportional CPU Hike
(in response to Jorg Lueke)
Anything like std Stats report showing elapsed time for the data period, pool size, etc?



This indicates that the updated pages are very far apart, physically…

Pages Written per Write I/O : 1.04







Joel Goldstein
Responsive Systems
Buffer Pool Tool(R) for DB2, the worldwide industry standard

Predicts the IO rate/Sec for tuning changes
Performance software that works......
Predicts Group Buffer Pool performance too!
www.responsivesystems.com http://www.responsivesystems.com
tel. (732) 972-1261
fax.(732) 972-9416

[login to unmask email]



From: Jorg Lueke [mailto:[login to unmask email]
Sent: Tuesday, April 30, 2019 12:48 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Disproportional CPU Hike



Here are some BP stats

Getpages per Sync I/O . : 9.81 Pages Written per Write I/O : 1.04
Prefetch per I/O . . . .: 1.20 Pages Read per Prefetch . . : 25.07
Seq Prefetch per I/O . .: 1.36 Pages Read per Seq Prefetch : 40.31
List Prefetch per I/O . : 1.02 Pages Read per List Prefetch: 24.18
Dyn Prefetch per I/O . .: 1.21 Pages Read per Dyn Prefetch : 25.13
Max Concur Prefetch . . : 36 Workfile Maximum . . . . . .: 0
BP Hit % - Random . . . : 88.1% Virtual Page Steal Method . : LRU
BP Hit % - Sequential . : 10.0%

DATA SET #: 1 USE COUNT: 1
DSNB453I
CURRENT = 5058 MAX = 5058
CHANGED = 0 MAX = 0
DSNB455I
AVERAGE DELAY = 3 MAXIMUM DELAY = 59
TOTAL PAGES = 892
DSNB467I
INSTANCE 1 -
DATA SET #: 1 USE COUNT: 1
DSNB453I
CURRENT = 13077 MAX = 13077
CHANGED = 0 MAX = 0
DSNB455I
AVERAGE DELAY = 2 MAXIMUM DELAY = 18
TOTAL PAGES = 297
DSNB467I



-----End Original Message-----

Jorg Lueke

RE: Disproportional CPU Hike
(in response to Joel Goldstein)

They have this. It seems like the job keeps running and slowly adding pages to the BP, The VDWQT was changed to 0,256

 

BP12   GENERAL               QUANTITY  /SECOND  /THREAD  /COMMIT
---------------------------  --------  -------  -------  -------
CURRENT ACTIVE BUFFERS        9968.03      N/A      N/A      N/A
UNAVAIL.BUFFER-VPOOL FULL        0.00     0.00     0.00     0.00
NUMBER OF DATASET OPENS          0.00     0.00     0.00     0.00
BUFFERS ALLOCATED - VPOOL    25257.44      N/A      N/A      N/A
DFHSM MIGRATED DATASET           0.00     0.00     0.00     0.00
DFHSM RECALL TIMEOUTS            0.00     0.00     0.00     0.00
VPOOL EXPANS. OR CONTRACT.       1.00     0.00     0.00     0.00
VPOOL EXPANS. FAILURES           0.00     0.00     0.00     0.00
CONCUR.PREF.I/O STREAMS-HWM     18.00      N/A      N/A      N/A
PREF.I/O STREAMS REDUCTION       0.00     0.00     0.00     0.00
PARALLEL QUERY REQUESTS       8290.00     2.34    15.94     0.42
PARALL.QUERY REQ.REDUCTION       0.00     0.00     0.00     0.00
PREF.QUANT.REDUCED TO 1/2        0.00     0.00     0.00     0.00
PREF.QUANT.REDUCED TO 1/4        0.00     0.00     0.00     0.00
NUMBER OF LPL INSERTS            0.00     0.00     0.00     0.00
MIN BUFFERS ON SLRU              0.00      N/A      N/A      N/A
MAX BUFFERS ON SLRU              0.00      N/A      N/A      N/A
SLRU LENGTH EQUALS VPSEQT        0.00     0.00     0.00     0.00
GETPAGE REQU RANDOM ON SLRU      0.00     0.00     0.00     0.00
 
BP12   READ OPERATIONS       QUANTITY  /SECOND  /THREAD  /COMMIT
---------------------------  --------  -------  -------  -------
BPOOL HIT RATIO (%)             91.39                          
BPOOL HIT RATIO (%) SEQU          N/C                          
BPOOL HIT RATIO (%) RANDOM      91.39                           
GETPAGE REQUEST              76785.00    21.70   147.66     3.85
GETPAGE REQUEST-SEQUENTIAL       0.00     0.00     0.00     0.00
GETPAGE REQUEST-RANDOM       76785.00    21.70   147.66     3.85
SYNCHRONOUS READS             6612.00     1.87    12.72     0.33
SYNCHRON. READS-SEQUENTIAL       0.00     0.00     0.00     0.00
SYNCHRON. READS-RANDOM        6612.00     1.87    12.72     0.33
GETPAGE PER SYN.READ-RANDOM     11.61                          
SEQUENTIAL PREFETCH REQUEST      0.00     0.00     0.00     0.00
SEQUENTIAL PREFETCH READS        0.00     0.00     0.00     0.00
PAGES READ VIA SEQ.PREFETCH      0.00     0.00     0.00     0.00
S.PRF.PAGES READ/S.PRF.READ       N/C                          
LIST PREFETCH REQUESTS           0.00     0.00     0.00     0.00
LIST PREFETCH READS              0.00     0.00     0.00     0.00
PAGES READ VIA LIST PREFTCH      0.00     0.00     0.00     0.00
L.PRF.PAGES READ/L.PRF.READ       N/C                          
DYNAMIC PREFETCH REQUESTED       0.00     0.00     0.00     0.00
DYNAMIC PREFETCH READS           0.00     0.00     0.00     0.00
PAGES READ VIA DYN.PREFETCH      0.00     0.00     0.00     0.00
D.PRF.PAGES READ/D.PRF.READ       N/C                          
PREF.DISABLED-NO BUFFER          0.00     0.00     0.00     0.00
PREF.DISABLED-NO READ ENG        0.00     0.00     0.00     0.00
PAGE-INS REQUIRED FOR READ    3612.00     1.02     6.95     0.18
 
BP12   WRITE OPERATIONS      QUANTITY  /SECOND  /THREAD  /COMMIT
---------------------------  --------  -------  -------  -------
BUFFER UPDATES                   0.00     0.00     0.00     0.00
PAGES WRITTEN                    0.00     0.00     0.00     0.00
BUFF.UPDATES/PAGES WRITTEN        N/C                           
SYNCHRONOUS WRITES               0.00     0.00     0.00     0.00
ASYNCHRONOUS WRITES              0.00     0.00     0.00     0.00
PAGES WRITTEN PER WRITE I/O       N/C                           
PAGES WRTN FOR CASTOUT I/O       0.00     0.00     0.00     0.00
NUMBER OF CASTOUT I/O            0.00     0.00     0.00     0.00
HORIZ.DEF.WRITE THRESHOLD        0.00     0.00     0.00     0.00
VERTI.DEF.WRITE THRESHOLD        0.00     0.00     0.00     0.00
DM THRESHOLD                     0.00     0.00     0.00     0.00
PAGE-INS REQUIRED FOR WRITE      0.00     0.00     0.00     0.00



In Reply to Joel Goldstein:

Anything like std Stats report showing elapsed time for the data period, pool size, etc?



This indicates that the updated pages are very far apart, physically…

Pages Written per Write I/O : 1.04







Joel Goldstein
Responsive Systems
Buffer Pool Tool(R) for DB2, the worldwide industry standard

Predicts the IO rate/Sec for tuning changes
Performance software that works......
Predicts Group Buffer Pool performance too!
www.responsivesystems.com http://www.responsivesystems.com
tel. (732) 972-1261
fax.(732) 972-9416

[login to unmask email]



From: Jorg Lueke [mailto:[login to unmask email]
Sent: Tuesday, April 30, 2019 12:48 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Disproportional CPU Hike



Here are some BP stats

Getpages per Sync I/O . : 9.81 Pages Written per Write I/O : 1.04
Prefetch per I/O . . . .: 1.20 Pages Read per Prefetch . . : 25.07
Seq Prefetch per I/O . .: 1.36 Pages Read per Seq Prefetch : 40.31
List Prefetch per I/O . : 1.02 Pages Read per List Prefetch: 24.18
Dyn Prefetch per I/O . .: 1.21 Pages Read per Dyn Prefetch : 25.13
Max Concur Prefetch . . : 36 Workfile Maximum . . . . . .: 0
BP Hit % - Random . . . : 88.1% Virtual Page Steal Method . : LRU
BP Hit % - Sequential . : 10.0%

DATA SET #: 1 USE COUNT: 1
DSNB453I
CURRENT = 5058 MAX = 5058
CHANGED = 0 MAX = 0
DSNB455I
AVERAGE DELAY = 3 MAXIMUM DELAY = 59
TOTAL PAGES = 892
DSNB467I
INSTANCE 1 -
DATA SET #: 1 USE COUNT: 1
DSNB453I
CURRENT = 13077 MAX = 13077
CHANGED = 0 MAX = 0
DSNB455I
AVERAGE DELAY = 2 MAXIMUM DELAY = 18
TOTAL PAGES = 297
DSNB467I



-----End Original Message-----

Joel Goldstein

Disproportional CPU Hike
(in response to Jorg Lueke)
Jorg,



When you say increasing buffers, do you mean

CURRENT ACTIVE BUFFERS 9968.03

Which is the number of buffers UNAVAILABLE because they are updated and not written out…

OR

BUFFERS ALLOCATED - VPOOL 25257.44

Which is the size of the pool



Current active buffers seems rather large if vdqwt=(0,256), unless there are a lot of objects being updated -- and there were no updates indicated in this set of data under the Write Operations Section.

(0,256) could allow about 40 or more to have less than 256 updated pages in the pool and not be writing anything out.





PAGE-INS REQUIRED FOR READ 3612.

Usually means either the processor memory is over committed, or the DBM1 address space is not running at the proper priority.

How much memory is on the LPAR, and what is the Free Frame count – is the system paging at all?



This data seems to be a (roughly) 6 minute time slice, but not a time when the application you’re working on was doing very much.



Joel



Joel Goldstein
Responsive Systems
Buffer Pool Tool(R) for DB2, the worldwide industry standard

Predicts the IO rate/Sec for tuning changes
Performance software that works......
Predicts Group Buffer Pool performance too!
www.responsivesystems.com http://www.responsivesystems.com
tel. (732) 972-1261
fax.(732) 972-9416

[login to unmask email]



From: Jorg Lueke [mailto:[login to unmask email]
Sent: Wednesday, May 01, 2019 7:58 AM
To: [login to unmask email]
Subject: [DB2-L] - RE: Disproportional CPU Hike



They have this. It seems like the job keeps running and slowly adding pages to the BP, The VDWQT was changed to 0,256



BP12 GENERAL QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
CURRENT ACTIVE BUFFERS 9968.03 N/A N/A N/A
UNAVAIL.BUFFER-VPOOL FULL 0.00 0.00 0.00 0.00
NUMBER OF DATASET OPENS 0.00 0.00 0.00 0.00
BUFFERS ALLOCATED - VPOOL 25257.44 N/A N/A N/A
DFHSM MIGRATED DATASET 0.00 0.00 0.00 0.00
DFHSM RECALL TIMEOUTS 0.00 0.00 0.00 0.00
VPOOL EXPANS. OR CONTRACT. 1.00 0.00 0.00 0.00
VPOOL EXPANS. FAILURES 0.00 0.00 0.00 0.00
CONCUR.PREF.I/O STREAMS-HWM 18.00 N/A N/A N/A
PREF.I/O STREAMS REDUCTION 0.00 0.00 0.00 0.00
PARALLEL QUERY REQUESTS 8290.00 2.34 15.94 0.42
PARALL.QUERY REQ.REDUCTION 0.00 0.00 0.00 0.00
PREF.QUANT.REDUCED TO 1/2 0.00 0.00 0.00 0.00
PREF.QUANT.REDUCED TO 1/4 0.00 0.00 0.00 0.00
NUMBER OF LPL INSERTS 0.00 0.00 0.00 0.00
MIN BUFFERS ON SLRU 0.00 N/A N/A N/A
MAX BUFFERS ON SLRU 0.00 N/A N/A N/A
SLRU LENGTH EQUALS VPSEQT 0.00 0.00 0.00 0.00
GETPAGE REQU RANDOM ON SLRU 0.00 0.00 0.00 0.00

BP12 READ OPERATIONS QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
BPOOL HIT RATIO (%) 91.39
BPOOL HIT RATIO (%) SEQU N/C
BPOOL HIT RATIO (%) RANDOM 91.39
GETPAGE REQUEST 76785.00 21.70 147.66 3.85
GETPAGE REQUEST-SEQUENTIAL 0.00 0.00 0.00 0.00
GETPAGE REQUEST-RANDOM 76785.00 21.70 147.66 3.85
SYNCHRONOUS READS 6612.00 1.87 12.72 0.33
SYNCHRON. READS-SEQUENTIAL 0.00 0.00 0.00 0.00
SYNCHRON. READS-RANDOM 6612.00 1.87 12.72 0.33
GETPAGE PER SYN.READ-RANDOM 11.61
SEQUENTIAL PREFETCH REQUEST 0.00 0.00 0.00 0.00
SEQUENTIAL PREFETCH READS 0.00 0.00 0.00 0.00
PAGES READ VIA SEQ.PREFETCH 0.00 0.00 0.00 0.00
S.PRF.PAGES READ/S.PRF.READ N/C
LIST PREFETCH REQUESTS 0.00 0.00 0.00 0.00
LIST PREFETCH READS 0.00 0.00 0.00 0.00
PAGES READ VIA LIST PREFTCH 0.00 0.00 0.00 0.00
L.PRF.PAGES READ/L.PRF.READ N/C
DYNAMIC PREFETCH REQUESTED 0.00 0.00 0.00 0.00
DYNAMIC PREFETCH READS 0.00 0.00 0.00 0.00
PAGES READ VIA DYN.PREFETCH 0.00 0.00 0.00 0.00
D.PRF.PAGES READ/D.PRF.READ N/C
PREF.DISABLED-NO BUFFER 0.00 0.00 0.00 0.00
PREF.DISABLED-NO READ ENG 0.00 0.00 0.00 0.00
PAGE-INS REQUIRED FOR READ 3612.00 1.02 6.95 0.18

BP12 WRITE OPERATIONS QUANTITY /SECOND /THREAD /COMMIT
--------------------------- -------- ------- ------- -------
BUFFER UPDATES 0.00 0.00 0.00 0.00
PAGES WRITTEN 0.00 0.00 0.00 0.00
BUFF.UPDATES/PAGES WRITTEN N/C
SYNCHRONOUS WRITES 0.00 0.00 0.00 0.00
ASYNCHRONOUS WRITES 0.00 0.00 0.00 0.00
PAGES WRITTEN PER WRITE I/O N/C
PAGES WRTN FOR CASTOUT I/O 0.00 0.00 0.00 0.00
NUMBER OF CASTOUT I/O 0.00 0.00 0.00 0.00
HORIZ.DEF.WRITE THRESHOLD 0.00 0.00 0.00 0.00
VERTI.DEF.WRITE THRESHOLD 0.00 0.00 0.00 0.00
DM THRESHOLD 0.00 0.00 0.00 0.00
PAGE-INS REQUIRED FOR WRITE 0.00 0.00 0.00 0.00



In Reply to Joel Goldstein:

Anything like std Stats report showing elapsed time for the data period, pool size, etc?



This indicates that the updated pages are very far apart, physically…

Pages Written per Write I/O : 1.04







Joel Goldstein
Responsive Systems
Buffer Pool Tool(R) for DB2, the worldwide industry standard

Predicts the IO rate/Sec for tuning changes
Performance software that works......
Predicts Group Buffer Pool performance too!
www.responsivesystems.com http://www.responsivesystems.com
tel. (732) 972-1261
fax.(732) 972-9416

[login to unmask email]



From: Jorg Lueke [mailto:[login to unmask email]
Sent: Tuesday, April 30, 2019 12:48 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Disproportional CPU Hike



Here are some BP stats

Getpages per Sync I/O . : 9.81 Pages Written per Write I/O : 1.04
Prefetch per I/O . . . .: 1.20 Pages Read per Prefetch . . : 25.07
Seq Prefetch per I/O . .: 1.36 Pages Read per Seq Prefetch : 40.31
List Prefetch per I/O . : 1.02 Pages Read per List Prefetch: 24.18
Dyn Prefetch per I/O . .: 1.21 Pages Read per Dyn Prefetch : 25.13
Max Concur Prefetch . . : 36 Workfile Maximum . . . . . .: 0
BP Hit % - Random . . . : 88.1% Virtual Page Steal Method . : LRU
BP Hit % - Sequential . : 10.0%

DATA SET #: 1 USE COUNT: 1
DSNB453I
CURRENT = 5058 MAX = 5058
CHANGED = 0 MAX = 0
DSNB455I
AVERAGE DELAY = 3 MAXIMUM DELAY = 59
TOTAL PAGES = 892
DSNB467I
INSTANCE 1 -
DATA SET #: 1 USE COUNT: 1
DSNB453I
CURRENT = 13077 MAX = 13077
CHANGED = 0 MAX = 0
DSNB455I
AVERAGE DELAY = 2 MAXIMUM DELAY = 18
TOTAL PAGES = 297
DSNB467I



-----End Original Message-----



-----End Original Message-----

Michael Hannan

RE: Disproportional CPU Hike
(in response to Jorg Lueke)

Jorg,

I think I could do a better SQL rewrite if I understand the cardinality of the tables and each column participating in predicates and how big the the range is usually on the range predicate column. i.e. we need to understand something about the filtering levels of the columns. What is the Unique Index (the columns) and what is the cluster index? Give the average  CPU per execution of The Open (without parallel and average Fetches per Open). 

If access path is not good, I don't see that BP tuning will make a  massive difference. The right SQL and access path might save 95%. BP tuning makes more modest saves generally, unless there is a serious BP fault.

 I won't invest more time though without the critical details. Also will you consider an SQL rewrite if it performs vastly better?

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ May 07, 2019 - 09:54 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ May 07, 2019 - 09:55 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ May 07, 2019 - 09:57 AM (Europe/Berlin)