Blog post on recent Db2 for z/OS RUNSTATS performance enhancements

Terry Purcell

Blog post on recent Db2 for z/OS RUNSTATS performance enhancements

https://www.worldofdb2.com/profiles/blogs/massive-runstats-cpu-reductions-when-collecting-db2-for-z-os

Regards

Terry Purcell

Walter Janißen

AW: Blog post on recent Db2 for z/OS RUNSTATS performance enhancements
(in response to Terry Purcell)
Hi Terry

Impressive, but the problem still remains to decrease the number of recommendations. My reorg-test during ESP with all the recommendation would drop down to 8 minutes elapsed, but this is still much more than 1 minute with only the default statistics. Following the example of Bonnie Baker: The cheapest collections are the one you don’t need to run.

Kind regards
Walter Janißen [standard_IBM+Champ+7+Yr+Analytics]

ITERGO Informationstechnologie GmbH
Anwendungsentwicklung
Technische Anwendungsarchitektur
Victoriaplatz 2
D-40198 Düsseldorf
[login to unmask email]<mailto:[login to unmask email]>

ITERGO Informationstechnologie GmbH
Vorsitzender des Aufsichtsrats: Christian Diedrich
Geschäftsführung: Dr. Bettina Anders (Vorsitzende),
Lothar Engelke, Ina Kirchhof, Dr. Michael Regauer
Sitz: Düsseldorf, Handelsregister: Amtsgericht Düsseldorf HRB 37996

Von: Terry Purcell [mailto:[login to unmask email]
Gesendet: Montag, 11. Dezember 2017 18:48
An: [login to unmask email]
Betreff: [DB2-L] - Blog post on recent Db2 for z/OS RUNSTATS performance enhancements


https://www.worldofdb2.com/profiles/blogs/massive-runstats-cpu-reductions-when-collecting-db2-for-z-os

Regards

Terry Purcell

-----End Original Message-----
Attachments

  • image001.png (2.6k)

Terry Purcell

RE: AW: Blog post on recent Db2 for z/OS RUNSTATS performance enhancements
(in response to Walter Janißen)

Walter,

Your direct feedback has been invaluable for the evolution of this Db2 feature over the past few years - so thank you.

With regard to your comments here - I would like to highlight that our first goal is to ensure that the optimizer has statistics that are targeted to the SQL statements in each workload. We realize however that there will be some instances where more statistics are recommended than are necessary. These are expected to be minor.

Such situations aren’t always as easy to resolve as you may think. When it’s simple or if it is a widespread issue - we work to solve them as quickly as possible. But if it’s not simple - we look to other solutions or opportunities.

Based upon feedback - we have been diligently working to improvement performance (the blog is an example) and also have continued to reduce situations where recommendations are made that do not provide value. Some of the prior enhancements and those that are currently in the pipeline came directly from you (again - thank you).

There still exist opportunities for Db2 to improve performance of RUNSTATS. And while we work on this - the zIIP offload capabilities do provide some relief.

We do feel strongly that overall this statistics feedback enhancement can help provide more valuable input to the optimizer for more reliable access path choices.

Regards
Terry Purcell

In Reply to Walter Janißen:

Hi Terry

Impressive, but the problem still remains to decrease the number of recommendations. My reorg-test during ESP with all the recommendation would drop down to 8 minutes elapsed, but this is still much more than 1 minute with only the default statistics. Following the example of Bonnie Baker: The cheapest collections are the one you don’t need to run.

Kind regards
Walter Janißen [standard_IBM+Champ+7+Yr+Analytics]

ITERGO Informationstechnologie GmbH
Anwendungsentwicklung
Technische Anwendungsarchitektur
Victoriaplatz 2
D-40198 Düsseldorf
[login to unmask email]<mailto:[login to unmask email]>

ITERGO Informationstechnologie GmbH
Vorsitzender des Aufsichtsrats: Christian Diedrich
Geschäftsführung: Dr. Bettina Anders (Vorsitzende),
Lothar Engelke, Ina Kirchhof, Dr. Michael Regauer
Sitz: Düsseldorf, Handelsregister: Amtsgericht Düsseldorf HRB 37996

Von: Terry Purcell [mailto:[login to unmask email]
Gesendet: Montag, 11. Dezember 2017 18:48
An: [login to unmask email]
Betreff: [DB2-L] - Blog post on recent Db2 for z/OS RUNSTATS performance enhancements


https://www.worldofdb2.com/profiles/blogs/massive-runstats-cpu-reductions-when-collecting-db2-for-z-os

Regards

Terry Purcell

-----End Original Message-----

Michael Hannan

RE: AW: Blog post on recent Db2 for z/OS RUNSTATS performance enhancements
(in response to Walter Janißen)

Walter,

It is not so clear, what is the problem you allude to, so I may be on a tangent. The cost of running the application SQL Queries is normally a lot more important than the cost of collecting Runstats from time to time (not with ridiculous frequency).

It is important to provide the Stats that the Optimizer needs rather than rely on Default Stats, just because they are cheaper. At one time KEYCARD Stats was not the default, and so many sites refused to collect them, despite being absolutely vital to calculating filter factors for multi column index match, that IBM was forced to make them non optional.

Runstats has zIIP offload to reduce its cost greatly and also the performance enhancements discussed. I don't know that elapsed time of it, is so important. I would collect the most expensive Stats at a quiet time.

In general, I think many sites are collecting many types of Runstats more often than necessary.

I believe it becomes important to understand with Db2 Stats, which ones are collected very cheaply and which are costly. Extremely cheap to collect Stats can be updated often if desired. Very expensive stats like frequent value stats not on index column(s) (or column groups) don't get to be very misrepresentative in a big hurry on many columns. Some Stats that are useful and costly to collect, can be collected on a rather infrequent basis to alleviate the cost or runtime.

I could see a case for monitoring the level of change in certain Stats and taking into account how Db2 uses them, then decide how often they need to be updated, even though no sites are probably currently doing this. e.g. Freq Value percentages for a column changed very slightly in 1 month. No reason to collect them frequently.  Suppose HIGH2KEY and LOW2KEY and COLCARDF changed only very slightly for a column over a month. Could recollect very infrequently. On other hand if HIGH2KEY is changing all the time like for a TIMESTAMP, and have range predicates with literals, then might need to keep it reasonably up to date, although Db2 has special techniques to look after range predicates outside the LOW2KEY and HIGH2KEY, like Index non-leaf sampling.

Not all Stats types are equally useful, nor need to be updated equally frequently. In my opinion, it is a mistake to treat all stats types ever recommended as having the same importance and frequency for update requirements.

Are our Stats so wrong, that they are fooling the Optimizer and resulting in wrong or sub-optimal access paths? That is the important question for me. Near enough Stats are usually good enough. The worst Stats are clearly those that incorrectly show the table is empty, when it has actually grown significantly. For highly dynamic data, Stats will never be 100% correct. So reasonably representative Stats are quite O.K.

Range predicate default interpolation filter factors are still used by the Optimizer, although have been improved for  < CURRENT DATE and < CURRENT TIMESTAMP predicates. All the Runstats in the world do not fix these Default Range Interpolation Filter Factors. So we must look at using DSN_PREDICATE_SELECTIVITY Hints to fix them.

We all should consider the problems of the Optimizer getting the right access paths and act accordingly with our Db2 Stats collection and other measures. Otherwise too frequent and too costly collection of Stats could be a solution looking for a problem. Sites could collect expensive Stats frequently that will never get used. Not all Stats are equally important to be very up to date.


In Reply to Walter Janißen:

Hi Terry

Impressive, but the problem still remains to decrease the number of recommendations. My reorg-test during ESP with all the recommendation would drop down to 8 minutes elapsed, but this is still much more than 1 minute with only the default statistics. Following the example of Bonnie Baker: The cheapest collections are the one you don’t need to run.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 31, 2017 - 05:24 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 31, 2017 - 05:25 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 31, 2017 - 05:34 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 31, 2017 - 05:35 AM (Europe/Berlin)

Terry Purcell

RE: AW: Blog post on recent Db2 for z/OS RUNSTATS performance enhancements
(in response to Michael Hannan)

Michael,

First my apologies that I wasn't clear to point out that Walter does have an example where his concern is legitimate. From what I recall (it's been approx 2 years?? since I discussed this particular example with him) - it is a scenario where there is a low cardinality equal predicate with literals and the optimizer recommends a frequency statistic. But there is a unique (or very high cardinality???) matching index available - so the frequency statistic is unlikely to improve the access path. Given that there are queries against numerous columns, this results in 100s of recommendations. Again - I am going off of my memory here.

Walter has other examples where there are more recommendations than necessary - but in most cases we have solutions in plan. It is also possible though that Walter wasn't specifically responding to the example I cite above.

I do agree with what you have stated in your reply here however. Unfortunately not all Db2 customers have the necessary skill to make the determination of how to manage statistics collection. And we are trying to make this area simpler to manage. But this is a journey that will continually evolve/improve.

In my initial response I was attempting to highlight that Walter's legitimate example should not detract from the statistics recommendations being valuable to the optimizer in the vast majority of cases, and therefore that this is a beneficial feature from Db2 11 & 12 for z/OS.

Thanks

Terry Purcell

In Reply to Michael Hannan:

Walter,

It is not so clear, what is the problem you allude to, so I may be on a tangent. The cost of running the application SQL Queries is normally a lot more important than the cost of collecting Runstats from time to time (not with ridiculous frequency).

It is important to provide the Stats that the Optimizer needs rather than rely on Default Stats, just because they are cheaper. At one time KEYCARD Stats was not the default, and so many sites refused to collect them, despite being absolutely vital to calculating filter factors for multi column index match, that IBM was forced to make them non optional.

Runstats has zIIP offload to reduce its cost greatly and also the performance enhancements discussed. I don't know that elapsed time of it, is so important. I would collect the most expensive Stats at a quiet time.

In general, I think many sites are collecting many types of Runstats more often than necessary.

I believe it becomes important to understand with Db2 Stats, which ones are collected very cheaply and which are costly. Extremely cheap to collect Stats can be updated often if desired. Very expensive stats like frequent value stats not on index column(s) (or column groups) don't get to be very misrepresentative in a big hurry on many columns. Some Stats that are useful and costly to collect, can be collected on a rather infrequent basis to alleviate the cost or runtime.

I could see a case for monitoring the level of change in certain Stats and taking into account how Db2 uses them, then decide how often they need to be updated, even though no sites are probably currently doing this. e.g. Freq Value percentages for a column changed very slightly in 1 month. No reason to collect them frequently.  Suppose HIGH2KEY and LOW2KEY and COLCARDF changed only very slightly for a column over a month. Could recollect very infrequently. On other hand if HIGH2KEY is changing all the time like for a TIMESTAMP, and have range predicates with literals, then might need to keep it reasonably up to date, although Db2 has special techniques to look after range predicates outside the LOW2KEY and HIGH2KEY, like Index non-leaf sampling.

Not all Stats types are equally useful, nor need to be updated equally frequently. In my opinion, it is a mistake to treat all stats types ever recommended as having the same importance and frequency for update requirements.

Are our Stats so wrong, that they are fooling the Optimizer and resulting in wrong or sub-optimal access paths? That is the important question for me. Near enough Stats are usually good enough. The worst Stats are clearly those that incorrectly show the table is empty, when it has actually grown significantly. For highly dynamic data, Stats will never be 100% correct. So reasonably representative Stats are quite O.K.

Range predicate default interpolation filter factors are still used by the Optimizer, although have been improved for  < CURRENT DATE and < CURRENT TIMESTAMP predicates. All the Runstats in the world do not fix these Default Range Interpolation Filter Factors. So we must look at using DSN_PREDICATE_SELECTIVITY Hints to fix them.

We all should consider the problems of the Optimizer getting the right access paths and act accordingly with our Db2 Stats collection and other measures. Otherwise too frequent and too costly collection of Stats could be a solution looking for a problem. Sites could collect expensive Stats frequently that will never get used. Not all Stats are equally important to be very up to date.


In Reply to Walter Janißen:

Hi Terry

Impressive, but the problem still remains to decrease the number of recommendations. My reorg-test during ESP with all the recommendation would drop down to 8 minutes elapsed, but this is still much more than 1 minute with only the default statistics. Following the example of Bonnie Baker: The cheapest collections are the one you don’t need to run.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Michael Hannan

RE: AW: Blog post on recent Db2 for z/OS RUNSTATS performance enhancements
(in response to Terry Purcell)

Terry,

Thanks, Indeed I was focussing on the performance of Runstats, rather than what Optimizer recommendations are made. I have not played with recommendations enough to know that area in great detail, having lived with no recommendations or my own recommendation queries for so many years. I have seen perhaps overly frequent collection of Stats in some sites.

The anomaly you mention is very understandable, based on a column predicate considered in Isolation. I do like to know Freq Val stats for low cardinality columns used with Equals a Literal, even if was only collected once. I take the point here that it doesn't matter if query performs optimally already using some independent access path.

You mention IBM will refine the recommendations code in the future to get rid of the anomalies like the example you mention. The recommendations can be overkill at first. I did notice there seemed to be quite aggressive reporting of "Stale" Stats.

In Reply to Terry Purcell:

Michael,

First my apologies that I wasn't clear to point out that Walter does have an example where his concern is legitimate. From what I recall (it's been approx 2 years?? since I discussed this particular example with him) - it is a scenario where there is a low cardinality equal predicate with literals and the optimizer recommends a frequency statistic. But there is a unique (or very high cardinality???) matching index available - so the frequency statistic is unlikely to improve the access path. Given that there are queries against numerous columns, this results in 100s of recommendations. Again - I am going off of my memory here.

 

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Jan 02, 2018 - 05:36 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Jan 02, 2018 - 06:21 AM (Europe/Berlin)

Terry Purcell

RE: AW: Blog post on recent Db2 for z/OS RUNSTATS performance enhancements
(in response to Michael Hannan)

Thanks Michael.

Yes - STALE recommendations are something that Walter has also highlighted. This is something that we would like to improve upon.

Regards

Terry

In Reply to Michael Hannan:

Terry,

Thanks, Indeed I was focussing on the performance of Runstats, rather than what Optimizer recommendations are made. I have not played with recommendations enough to know that area in great detail, having lived with no recommendations or my own recommendation queries for so many years. I have seen perhaps overly frequent collection of Stats in some sites.

The anomaly you mention is very understandable, based on a column predicate considered in Isolation. I do like to know Freq Val stats for low cardinality columns used with Equals a Literal, even if was only collected once. I take the point here that it doesn't matter if query performs optimally already using some independent access path.

You mention IBM will refine the recommendations code in the future to get rid of the anomalies like the example you mention. The recommendations can be overkill at first. I did notice there seemed to be quite aggressive reporting of "Stale" Stats.

In Reply to Terry Purcell:

Michael,

First my apologies that I wasn't clear to point out that Walter does have an example where his concern is legitimate. From what I recall (it's been approx 2 years?? since I discussed this particular example with him) - it is a scenario where there is a low cardinality equal predicate with literals and the optimizer recommends a frequency statistic. But there is a unique (or very high cardinality???) matching index available - so the frequency statistic is unlikely to improve the access path. Given that there are queries against numerous columns, this results in 100s of recommendations. Again - I am going off of my memory here.

 

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd