Increase in CPU time after DB2 11 upgrade

Ravikumar sri2001

Increase in CPU time after DB2 11 upgrade

Experts,

I am analyzing performance issue in a IMS transaction which issues about 15K to 30K DB2 SQLs. The customer has upgraded DB2 to version 11 in the mid of Sep-2017 and we are noticing increase in average CPU time/transaction since Sep-2017. DBA feels that the increase in average CPU time/transaction should be due to the data growth in the tables involved in the transaction and application changes implemented in production. DBA points out that if Db2 upgrade is causing issue, then it should have caused CPU increase in the all other applications, but other applications are running fine, so there is no issue with the Db2 upgrade.

This transaction invokes about 12 COBOL sub programs. I tried to find if the code changes and data growth are contributing to CPU increase, but i could not find convincing answer for the same. Packages which has not gone thru any code changes since Aug-2017, are also reporting increase in CPU time.

Could you please help me how to approach this issue ?

Thanks,
Ravikumar

Michael Hannan

RE: Increase in CPU time after DB2 11 upgrade
(in response to Ravikumar sri2001)

Ravikumar,

If some Packages are performing just fine, that proves absolutely nothing about whether the upgrade is causing any CPU increases. Different Packages do different things, and only a very small number of Packages may use some DB2 feature, or access path, that was affected.

Db2 access paths can change at a new release, for dynamic or for Static with REBIND. Even without access path change the underlying workings can change.

However I would not be too quick to try to blame Db2 11. Problem is more likely to be elsewhere.

So some basic things:

Is the DB2 part of the costs increased by Package? SMF Db2 Acctng data might be needed. You might not have that performance info in place. Has Db2 CPU time per transaction increased?

Are we talking about Static SQL? Have any Package REBINDs occurred? If performance history has been captured, was there a sudden change after REBIND? Did access paths change?

Did the tables involved change dramatically in the number of rows? With weak access paths, a growth in table size comes with CPU increase, due to scanning getting longer. With really good access paths well indexed, growth in table size may not cause growth in CPU significantly, perhaps only minor CPU change.

It's hard to do a lot if your site is not well setup to capture DB2 Acctng data and keep a history of performance data. Also to capture Explain access paths and look for history of change there too.

Right now I don't recall any specific things in Db2 11 that might cause performance degradation, without a change to a less optimal access path. Perhaps others will remember some very specific dangers in the release, or particular access path changes that did not go well. I recall something more-so for Db2 10.

Typically I am recalling things that performed better in DB11 and 12, due to advances in the Db2 Optimizer, and in the subsystem.

In Db2 10 I took a site exercise to look at average CPU change in packages with and without REBIND and with and without access path change, which is site dependent. There was great complexity involved to ensure measuring fairly before and after the release. DB2 10 Packages came with approx. 2 to 3% improvement with no access path change and somewhat more with access path changes, allowing for backing out the REBIND of certain pre-existing problem access paths that needed an OPTHINT to get a much better access path.

As the exercise was so time consuming to do really well, I have not repeated such a measure for Db2 11. The danger to do the exercise crudely, is get bullshit results. Had to measure only Packages with very stable usage and performance. Packages with very Ad Hoc unpredictable performance cannot be compared easily from one release to another.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Roy Boxwell

Increase in CPU time after DB2 11 upgrade
(in response to Ravikumar sri2001)
One of my customers had one change in one access path in one package that was a disaster. Went from I1 access to R...
PMR was opened, APAR & PTF delivered, applied, all was well.
Check you are up to date on maintenance and review all HOLD actions!

Roy Boxwell

SOFTWARE ENGINEERING GMBH and SEGUS Inc.
-Product Development-

Heinrichstrasse 83-85
40239 Duesseldorf/Germany
Tel. +49 (0)211 96149-675
Fax +49 (0)211 96149-32
Email: [login to unmask email]<mailto:[login to unmask email]>
http://www.seg.de http://www.seg.de

Software Engineering GmbH
Amtsgericht Düsseldorf, HRB 37894
Geschäftsführung: Gerhard Schubert

From: Ravikumar sri2001 [mailto:[login to unmask email]
Sent: Thursday, March 8, 2018 5:05 AM
To: [login to unmask email]
Subject: [DB2-L] - Increase in CPU time after DB2 11 upgrade


Experts,

I am analyzing performance issue in a IMS transaction which issues about 15K to 30K DB2 SQLs. The customer has upgraded DB2 to version 11 in the mid of Sep-2017 and we are noticing increase in average CPU time/transaction since Sep-2017. DBA feels that the increase in average CPU time/transaction should be due to the data growth in the tables involved in the transaction and application changes implemented in production. DBA points out that if Db2 upgrade is causing issue, then it should have caused CPU increase in the all other applications, but other applications are running fine, so there is no issue with the Db2 upgrade.

This transaction invokes about 12 COBOL sub programs. I tried to find if the code changes and data growth are contributing to CPU increase, but i could not find convincing answer for the same. Packages which has not gone thru any code changes since Aug-2017, are also reporting increase in CPU time.

Could you please help me how to approach this issue ?

Thanks,
Ravikumar

-----End Original Message-----

Avram Friedman

RE: Increase in CPU time after DB2 11 upgrade
(in response to Ravikumar sri2001)

The default suggestion for performance hits after service is REBINDS.
As the original poster also notes data volitilitiy as a possible cause I would suggest
RUNSTATS followed by REBINDS.

If you think this issue may merit eventual involvement (most reports do not) then either insure you capture configuration like plan tables, states and trace data before making a change.

Avram Friedman
DB2-L hall of fame contributer
DB2-L acting administrator

[login to unmask email]

Michael Hannan

RE: Increase in CPU time after DB2 11 upgrade
(in response to Roy Boxwell)

In Reply to Roy Boxwell:

One of my customers had one change in one access path in one package that was a disaster. Went from I1 access to R...
PMR was opened, APAR & PTF delivered, applied, all was well.
Check you are up to date on maintenance and review all HOLD actions!

Roy Boxwell

Has always been possible for the Optimizer to choose TS Scan where it is not actually optimal, due to Optimizer having no clue if the SQL will be executed once per transaction or repeated in a loop, or when Stats are not representing a realistic approximation to the size. This matters for real performance due to lookaside type capabilities. 

That is indeed a strange access path change, and good case from a problem report. Earlier Db2 versions always had I1 (Index One Fetch) as virtually the highest priority access path, along with Full Equals Match on a Unique Index (never happens that both are possible at same time). Personally I hate Tablespace Scans (Access Type R) for OLTP applications, even on the smallest of tables, as tend to cost 2 Getpages per access. Any repeated access by an index (in same transaction) can potentially have less than 2 Getpages on average per call (or probe in a join/subquery) using lookaside. If the Index access is not most  optimal over TS Scan when only a single probe or call per transaction is made, then one usually does not care. We tune for the high cost SQLs, and worry little about the low cost ones.

I am often tempted to make small tables "Volatile" to encourage Index Use. You really don't want Stats saying the table is empty (or near to it) when it actually has any significant rows. "Volatile" is a protection against tables suddenly growing to significantly bigger than what the current Runstats suggest, and getting bad TS Scan access paths. 

I really try to avoid this attitude: " The table is small so it does not need indexes. TS Scan won't take long." My attitude is that small tables can afford many indexes carefully crafted to suit the queries.

Getting back on topic just slightly, I have recalled that in Db2 11, aggregating queries like COUNT(*), SUM(col), etc. were enhanced so that a List Prefetch access path could have RIDs collected overflow to a workfile. If the existing access path was already a bad MX/MI, with some legs doing no filtering, then a change to overflow RIDs to workfile, could actually be detrimental,  theoretically. Can't say I actually saw it in Db2 11 happen. Just really struggling to think of many things that could have gone potentially bad in Db2 11, unless was a fault/bug.

I have seen the odd non ideal NR access path (Range List), which could occur in Db2 11, even though NR first arrived in Db2 10. However 95%+ of NR access paths are working pretty good.

Any release of Db2 can get a far from optimal access path when has range predicates on Hostvariables, Equals or IN predicates on Hostvariables when table data is skewed, and joins/subqueries where DB2 has no concept of relative sequencing of rows between more than table accessed, causing undesirable random access. Those are my favourite 3 problems where Optimizer has unrealistic idea of the true cost of the access path. 

I would expect up to date maintenance on Db2 11 to be performing pretty good, overall, barring changes to access paths that were not the best. Can be important to review access path changes in certain categories.

TS Scan is O.K. when you have no filtering predicates and want the whole table. LOL Still O.K. if want more than 25% of the data rows  (the rule of thumb) for medium to  large tables. Otherwise I really don't like it. What about a 1 row table? Here I don't like TS Scan as an access path for very frequent SQL calls, as will probably cost 2 Getpages for just one row. The reason I think, is that TS Scan has to examine the Spacemap page first to determine which Segments and pages to be scanned, before the 1 data page.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Tommy Petersen

RE: Increase in CPU time after DB2 11 upgrade
(in response to Ravikumar sri2001)

It could be caused by some predicates now being stage 1 which can cause DB2 to use a different a access path, combined with queries that don't read the entire result set.

If a function is used on an indexed column, such as YEAR(datecolumn), and SUBSTR(column,1,n), DB2 V10 would not be able to make efficient use of the indexes. If there also is a different index that matches the “order by” clause, that index might be used to avoid the sort.   Because DB2 only needs to read enough rows to fulfill the fetches from the program, it may never read all the rows needed to complete the query. 

DB2 V11 can make efficient use of the above predicates, and in theory, that could reduce the number of rows to be read to much less and running the sort will not negate the savings of using the more efficient index. The downside is that DB2 will have to always complete the query, so more rows may be read. 

I don’t know how much savings would be needed in order for DB2 to not avoid the sort. 

The solution is simply to add OPTIMIZE FOR 1 ROWS in the query and it should revert back to using sort avoidance.

Ravikumar sri2001

RE: Increase in CPU time after DB2 11 upgrade
(in response to Tommy Petersen)

Experts,

Sorry for the late reply.!!

The packages(COBOL) involved in the IMS transaction are rebound multiple times, so I do not have history of PLAN_TABLE to see if access path has changed.

Every transaction issues about 10K to 20K SQLs. More than 100 tables used in the transaction.All these tables are in "SSD". By looking at the SYSIBM.SYSTABLES_HIST for the data growth, I could not find sudden data growth. All the these DB2 tables are NOT coverted to "extended RBA".

My customer is using Tivoli Decision Support (TDS) to store the Db2 Acctng data and daily Db2 Acctng data is rolled up to monthly table and daily Db2 Acctng data is maintained only for last 10 days.

Given below the performance data of the transaction month on month.

Date CLASS2_CPU(in seconds) CLASS1_CPU(in seconds) No.of.TRANSACTIONS 
2017-02-01 118682.55 155,205.73 1,028,796
2017-03-01 210323.05 275,578.28 1,754,093
2017-04-01 173384.10 227,123.02 1,538,328
2017-05-01 217804.79 284,819.29 1,743,415
2017-06-01 216750.39 281,739.27 1,602,015
2017-07-01 215690.01 280,207.68 1,652,358
2017-08-01 227754.63 296,358.31 1,665,853
2017-09-01 326732.92 405,144.76 1,722,512
2017-10-01 413688.78 496,665.51 1,759,998
2017-11-01 464431.46 561,095.52 2,370,216
2017-12-01 454914.52 548,660.97 2,203,784
2018-01-01 312212.56 380,858.82 1,641,039

Below is the data for one of package involved in the transaction. Comparing Jan-18 with Aug-17, CPU time has increased. But this packages has not gone thru any code change, but it was recompiled due to working-storage copybook change.

DATE PKG_TCB_SEC
02/01/17 32798.74
03/01/17 52664.94
04/01/17 46136.19
05/01/17 55189.61
06/01/17 53837.44
07/01/17 57887.65
08/01/17 59800.97
09/01/17 83597.83
10/01/17 103166.58
11/01/17 108625.08
12/01/17 100548.00
01/01/18 73611.61


Thanks,
Ravikumar

Jim Tonchick

Increase in CPU time after DB2 11 upgrade
(in response to Ravikumar sri2001)

You mentioned that the COBOL programs we're rebound several times. We're these BIND with REPLACE or REBIND commads?

If they were REBIND commands and you use PLANMGMT then you'll be able to "roll back" to the previous version on the package or the original BIND version.

I have the zparms set to use PLANMGMT EXTENDED by default. This has saved us a couple of times when applying PTFs or as part of a release upgrade. We REBIND the most active packages after maintenance or an upgrade. If the performance drops, we just REBIND PREVIOUS.

-----Original Message-----
From: Ravikumar sri2001 <[login to unmask email]>
To: DB2-L <[login to unmask email]>
Sent: Wed, Mar 14, 2018 02:27 PM
Subject: [DB2-L] - RE: Increase in CPU time after DB2 11 upgrade



<div id="AOLMsgPart_2_5b3f0924-5712-420d-b8a0-6bc429b0f0be">
<div class="aolReplacedBody"><p>Experts,</p>
<p>Sorry for the late reply.!!</p>
<p>The packages(COBOL) involved in the IMS transaction are rebound multiple times, so I do not have history of PLAN_TABLE to see if access path has changed.</p>
<p>Every transaction issues about 10K to 20K SQLs. More than 100 tables used in the transaction.All these tables are in "SSD". By looking at the SYSIBM.SYSTABLES_HIST for the data growth, I could not find sudden data growth. All the these DB2 tables are NOT coverted to "extended RBA".</p>
<p>My customer is using Tivoli Decision Support (TDS) to store the Db2 Acctng data and daily Db2 Acctng data is rolled up to monthly table and daily Db2 Acctng data is maintained only for last 10 days.</p>
<p>Given below the performance data of the transaction month on month.</p>
<p>Date CLASS2_CPU(in seconds) CLASS1_CPU(in seconds) No.of.TRANSACTIONS
2017-02-01 118682.55 155,205.73 1,028,796
2017-03-01 210323.05 275,578.28 1,754,093
2017-04-01 173384.10 227,123.02 1,538,328
2017-05-01 217804.79 284,819.29 1,743,415
2017-06-01 216750.39 281,739.27 1,602,015
2017-07-01 215690.01 280,207.68 1,652,358
2017-08-01 227754.63 296,358.31 1,665,853
2017-09-01 326732.92 405,144.76 1,722,512
2017-10-01 413688.78 496,665.51 1,759,998
2017-11-01 464431.46 561,095.52 2,370,216
2017-12-01 454914.52 548,660.97 2,203,784
2018-01-01 312212.56 380,858.82 1,641,039</p>
<p>Below is the data for one of package involved in the transaction. Comparing Jan-18 with Aug-17, CPU time has increased. But this packages has not gone thru any code change, but it was recompiled due to working-storage copybook change.</p>
<p>DATE PKG_TCB_SEC
02/01/17 32798.74
03/01/17 52664.94
04/01/17 46136.19
05/01/17 55189.61
06/01/17 53837.44
07/01/17 57887.65
08/01/17 59800.97
09/01/17 83597.83
10/01/17 103166.58
11/01/17 108625.08
12/01/17 100548.00
01/01/18 73611.61</p>
<p>
Thanks,
Ravikumar</p>
<hr size="1" style="color:#ccc"><div id="aolmail_socfooter" style="font-size:80%"><span style="font-weight:bold">Site Links: </span>
<a target="_blank" rel="noopener noreferrer" href="https://www.idug.org/p/fo/st/?post=185172&anc=p185172#p185172">View post online</a>
<a target="_blank" rel="noopener noreferrer" href="https://www.idug.org/p/fo/si/?topic=19">View mailing list online</a>
<a target="_blank" rel="noopener noreferrer" href="mailto:[login to unmask email]">Start new thread via email</a>
<a target="_blank" rel="noopener noreferrer" href="mailto:[login to unmask email]?Subject=Unsubscribe">Unsubscribe from this mailing list</a>
<a target="_blank" rel="noopener noreferrer" href="https://www.idug.org/p/us/to/">Manage your subscription</a>


This email has been sent to: <a href="mailto:[login to unmask email]">[login to unmask email]</a>
<p>
** ** ** IDUG DB2 Data and Analytics Technical Summit in Bengaluru, India 2018 ** ** **

---> Bengaluru, India, March 27, 2018 <---

<a target="_blank" rel="noopener noreferrer" href="http://ibm.biz/IDUGBengaluru2018">http://ibm.biz/IDUGBengaluru2018</a>

</p>
<p>
Use of this email content is governed by the terms of service at:
<a target="_blank" rel="noopener noreferrer" href="http://www.idug.org/p/cm/ld/fid=2">http://www.idug.org/p/cm/ld/fid=2</a></p>

<hr size="1" style="color:#ccc"></div>
</div>
</div>

Michael Hannan

RE: Increase in CPU time after DB2 11 upgrade
(in response to Ravikumar sri2001)

In Reply to Ravikumar sri2001:

The packages(COBOL) involved in the IMS transaction are rebound multiple times, so I do not have history of PLAN_TABLE to see if access path has changed.

Every transaction issues about 10K to 20K SQLs. More than 100 tables used in the transaction.All these tables are in "SSD". By looking at the SYSIBM.SYSTABLES_HIST for the data growth, I could not find sudden data growth. All the these DB2 tables are NOT coverted to "extended RBA".

My customer is using Tivoli Decision Support (TDS) to store the Db2 Acctng data and daily Db2 Acctng data is rolled up to monthly table and daily Db2 Acctng data is maintained only for last 10 days.

Given below the performance data of the transaction month on month.

Date CLASS2_CPU(in seconds) CLASS1_CPU(in seconds) No.of.TRANSACTIONS 
2017-02-01 118682.55 155,205.73 1,028,796
2017-03-01 210323.05 275,578.28 1,754,093
2017-04-01 173384.10 227,123.02 1,538,328
2017-05-01 217804.79 284,819.29 1,743,415
2017-06-01 216750.39 281,739.27 1,602,015
2017-07-01 215690.01 280,207.68 1,652,358
2017-08-01 227754.63 296,358.31 1,665,853
2017-09-01 326732.92 405,144.76 1,722,512
2017-10-01 413688.78 496,665.51 1,759,998
2017-11-01 464431.46 561,095.52 2,370,216
2017-12-01 454914.52 548,660.97 2,203,784
2018-01-01 312212.56 380,858.82 1,641,039

Below is the data for one of package involved in the transaction. Comparing Jan-18 with Aug-17, CPU time has increased. But this packages has not gone thru any code change, but it was recompiled due to working-storage copybook change.

DATE PKG_TCB_SEC
02/01/17 32798.74
03/01/17 52664.94
04/01/17 46136.19
05/01/17 55189.61
06/01/17 53837.44
07/01/17 57887.65
08/01/17 59800.97
09/01/17 83597.83
10/01/17 103166.58
11/01/17 108625.08
12/01/17 100548.00
01/01/18 73611.61


Thanks,
Ravikumar

Ravikumar,

Does SSD mean Solid State Disk? Not exactly a well known abbreviation in Db2 terms.

 The information you give suggests an access path problem potentially occurring when you went to Db2 11 in Sep 2017. Not all access path changes are beneficial. The vast majority are (like 95%). There is not proof of course.

It is likely these this transaction has always had some access path problem based on data for Jan 2017, by day. The CPU seems to grow dramatically during the month, suggesting growth of an important table from nearly empty, or the table gets Reorged and then becomes incredibly disorganised. What happens of importance at the start of month that makes the CPU much lower??? Normally we don't want to see a transaction whose costs increase throughout the month.

You really don't want to keep Rebinding a Package too often. The access path could change, and getting a bad performance is possible.

When REBIND or BIND occurs you should keep the PLAN_TABLE access paths, and use PLANMGT Extended to enable REBIND SWITCH back to previous copy in case of bad access path change and wish to get old access path back again. Retain Duplicates is best to ensure SWITCH is possible.

I am not at all in favour of aggressive clean up to PLAN_TABLE. Keep data there for at least 6 months and in case of your site perhaps longer if takes many months to realise you have a performance problem, and diagnose it, also do not clean up the last 3 BIND/REBINDs regardless of how old they are. Same applies to other Extended Explain tables.

At some sites, I have been tempted to implement History tables for Explain, because the site cleaned out the Explain data much too fast. At other sites we just have to persuade them to collect access paths in the first place.

With Plan Mgt Extended on, can get the last 3 access path iterations back again using Explain Package command specifying Copyid, however that may not go back to 6 months ago, if REBINDs have been frequent.

I don't like to see very frequent REBIND. If you have a good performing access path already, you don't want to REBIND unless you have to, or are trying to change to a better access path. It does make sense to REBIND at least once per Db2 release to get potentially better performance.

Keeping Daily Acctng Data for just 10 days is not really good enough. It is possible to drop a lot of lesser important fields, also keep numeric column as data type REAL (FLOAT 4 bytes) to save space and summarize appropriately. Then keep the data for months. Keeping Package level Summarized accounting data for months is also important. I like to see at least 13 months of history (LOL) so can compare this year performance to last year.

If you process the Package data right now, you might know which Package is performing so poorly. It is then possible to analyse the SQL and access paths to see where a likely problem is. Even better if have a tool that can report the CPU usage by SQL Statement, for case where a poor performing Package has too many SQLs in it. Essentially these are things I am using regularly.

Understanding performance is all about having the performance data to help. It's even wise to have hourly CPU data with some level of history. Helps to see if cost change coincided with  REBIND of the Package. Unfortunately Explain data in the PLAN_TABLE becomes the Historical record of when Package REBINDs occurred provided it is not cleaned out.

Disk space is quite cheap these days. Don't be too fast to clean out important history.

My conclusion is that it's likely you have access path problems for your SQLs, which may have coincided with change of access path selection at a new release. There are certainly other possibilities to be considered once deciding the access paths are good. A fault in Db2 11, requiring a fix is low probability, but possible. A non-optimal access path selected by Db2 11 is a good possibly. The Db2 Optimizer does not understand the data thoroughly, and is limited by information available. Sites have certainly experienced access paths going bad. That is why not to use frequent automated REBIND very often. There is a risk.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Mar 15, 2018 - 07:46 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Mar 15, 2018 - 07:50 AM (Europe/Berlin)