DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables

Bill Gallagher

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
Good afternoon,

We're looking into enhancing the automation of our RUNSTATS processing and utilizing the data in the RTS tables to determine if and when RUNSTATS needs to be run for any particular tablespace.

We're looking at three basic conditions:


1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete activity against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn't been run in "x" number of months

That last condition is being thrown in as a "catch-all" condition, just to make sure that we don't have any stale statistics (which we currently do).

A question came up about the impact of doing that. Basically, is there any benefit to running RUNSTATS against tablespaces that contain very low levels of or no changed data, and conversely, is there any harm in doing so?

Assuming that we have no odd circumstances which cause access paths to misbehave when the tablespaces or indexes that they access are updated by RUNSTATS (as far as I know, we do not), my assumptions are the following:

Benefits: we could pick up different (and hopefully better) access paths with current RUNSTATS due to DB2 maintenance or version upgrades.

Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS on a tablespace that has not been updated since the last time RUNSTATS was executed.

Am I missing anything?

What are others doing for running RUNSTATS on a regular basis?

Thanks.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration
________________________________
This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201

Philip Sevetson

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Bill Gallagher)
Bill G.,

Rule 3 doesn't make any sense to me. I can't see how re-executing a particular RUNSTATs statement, without data changes, could make a difference in access paths; as far as I know, the optimizer doesn't check dates as part of the rules.

The two reason I can think of to re-execute RUNSTATs, in the absence of some arbitrarily significant amount of changed data, are

1) A code-level utility upgrade which provides different results from a particular instance of the utility. I'm pretty sure that doesn't happen with every upgrade, and I'd expect to see it in the release notes.

2) Changing the statement to provide different output (FREQVAL settings)

REBIND, now, when you change code levels in the optimizer - that's obviously a different matter. I recommend doing that, with version stabilization coded in. Or you could just create a "next" version of the package(s) and compare access paths, new to old.

--Phil

From: Gallagher,Bill R [mailto:[login to unmask email]
Sent: Monday, September 10, 2018 12:51 PM
To: [login to unmask email]
Subject: [DB2-L] - DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables

Good afternoon,

We're looking into enhancing the automation of our RUNSTATS processing and utilizing the data in the RTS tables to determine if and when RUNSTATS needs to be run for any particular tablespace.

We're looking at three basic conditions:


1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete activity against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn't been run in "x" number of months

That last condition is being thrown in as a "catch-all" condition, just to make sure that we don't have any stale statistics (which we currently do).

A question came up about the impact of doing that. Basically, is there any benefit to running RUNSTATS against tablespaces that contain very low levels of or no changed data, and conversely, is there any harm in doing so?

Assuming that we have no odd circumstances which cause access paths to misbehave when the tablespaces or indexes that they access are updated by RUNSTATS (as far as I know, we do not), my assumptions are the following:

Benefits: we could pick up different (and hopefully better) access paths with current RUNSTATS due to DB2 maintenance or version upgrades.

Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS on a tablespace that has not been updated since the last time RUNSTATS was executed.

Am I missing anything?

What are others doing for running RUNSTATS on a regular basis?

Thanks.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration
________________________________
This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201
-----End Original Message-----
**This e-mail, including any attachments, may be confidential, privileged, or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy, or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system.**

Bill Gallagher

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Philip Sevetson)
Phil,

Thanks for the response. I inadvertently omitted one of the reasons why we are considering #3, but you touched upon it. It would be to pick up any changes to the RUNSTATS utility itself that could have introduced either by software maintenance or a version upgrade. Release notes, or verbiage in a PTF, should identify that.

I suppose an alternative to just doing it every "x" months would be to trigger off a date that we can store in a table which would correspond to when such a software change may have been applied to that subsystem.

As to your other comment (which referred to FREQVAL), we are also incorporating information from SYSSTATFEEDBACK into our new process. So new queries against old, unchanging data, could also trigger a RUNSTATS if SYSSTATFEEDBACK is populated.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration

From: Sevetson, Phil <[login to unmask email]>
Sent: Monday, September 10, 2018 2:09 PM
To: '[login to unmask email]' <[login to unmask email]>
Subject: [DB2-L] - RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables

Bill G.,

Rule 3 doesn't make any sense to me. I can't see how re-executing a particular RUNSTATs statement, without data changes, could make a difference in access paths; as far as I know, the optimizer doesn't check dates as part of the rules.

The two reason I can think of to re-execute RUNSTATs, in the absence of some arbitrarily significant amount of changed data, are

1. A code-level utility upgrade which provides different results from a particular instance of the utility. I'm pretty sure that doesn't happen with every upgrade, and I'd expect to see it in the release notes.
2. Changing the statement to provide different output (FREQVAL settings)

REBIND, now, when you change code levels in the optimizer - that's obviously a different matter. I recommend doing that, with version stabilization coded in. Or you could just create a "next" version of the package(s) and compare access paths, new to old.

--Phil
From: Gallagher,Bill R [mailto:[login to unmask email]
Sent: Monday, September 10, 2018 12:51 PM
To: [login to unmask email]<mailto:[login to unmask email]>
Subject: [DB2-L] - DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables

Good afternoon,

We're looking into enhancing the automation of our RUNSTATS processing and utilizing the data in the RTS tables to determine if and when RUNSTATS needs to be run for any particular tablespace.

We're looking at three basic conditions:


1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete activity against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn't been run in "x" number of months

That last condition is being thrown in as a "catch-all" condition, just to make sure that we don't have any stale statistics (which we currently do).

A question came up about the impact of doing that. Basically, is there any benefit to running RUNSTATS against tablespaces that contain very low levels of or no changed data, and conversely, is there any harm in doing so?

Assuming that we have no odd circumstances which cause access paths to misbehave when the tablespaces or indexes that they access are updated by RUNSTATS (as far as I know, we do not), my assumptions are the following:

Benefits: we could pick up different (and hopefully better) access paths with current RUNSTATS due to DB2 maintenance or version upgrades.

Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS on a tablespace that has not been updated since the last time RUNSTATS was executed.

Am I missing anything?

What are others doing for running RUNSTATS on a regular basis?

Thanks.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration
________________________________
This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201
-----End Original Message-----
**This e-mail, including any attachments, may be confidential, privileged, or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy, or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system.**
-----End Original Message-----

Lockwood Lyon

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Bill Gallagher)
Bill,

IMHO, I think that automating running RUNSTATS is a waste of effort
that implements a costly procedure.

I believe that RUNSTATS gathers statistics by reading every row in
the table / index(es). If so, it's a costly process both CPU- and
I/O-wise.

Consider:
- If you have a lot of statically bound packages, RunStats
doesn't help unless you ReBind; and even then, you need to be
careful that access path(s) don't regress. Do you have a plan
management scheme implemented?
- If you have a lot of dynamic SQL, don't you run the risk
of changing access paths without knowing what effects that will have
... until someone runs a critical on-line application?
- If there has been a significant amount of Ins/Upd/Del
activity since the last execution of RunStats, ... well, if you
implement your automation then volatile tables will get RunStats run
regularly. Is this wise? Didn't execution N of RunStats measure the
effects of the updates, and won't the N+1 execution give you the
same info? (Excepting, of course, cardinality changes.) Oh, and if
these tables are volatile, aren't you running Reorgs? If so, won't
you have a Stats specification as part of the Reorg? Do you really
want lots of activity on a table to trigger a RunStats, and not a
Reorg, at least for the indexes?

Last, don't forget publish/subscribe table pairs, replication pairs,
and any self-built clones. These encompass situations where a table
in location A is copied verbatim on a regular basis to location B.
Example: Operational tables thatare copied to the data warehouse, DW
dimension tables that are published to data marts, etc. These
target tables may get changed/loaded daily, yet their contents may
not change!

Just some thoughts, may not be applicable to your enterprise.

- Lock Lyon


From: Gallagher,Bill R [mailto:[login to unmask email]
Sent: Monday, September 10, 2018 12:51 PM
To: [login to unmask email]
Subject: [DB2-L] - DB2 z/OS v11: RUNSTATS on Static (Unchanging)
Tables

Good afternoon,

We're looking into enhancing the automation of our RUNSTATS
processing and utilizing the data in the RTS tables to determine if
and when RUNSTATS needs to be run for any particular tablespace.

We're looking at three basic conditions:

1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete
activity against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn't been run in "x" number of months

That last condition is being thrown in as a "catch-all" condition,
just to make sure that we don't have any stale statistics (which we
currently do).

A question came up about the impact of doing that. Basically, is
there any benefit to running RUNSTATS against tablespaces that
contain very low levels of or no changed data, and conversely, is
there any harm in doing so?

Assuming that we have no odd circumstances which cause access paths
to misbehave when the tablespaces or indexes that they access are
updated by RUNSTATS (as far as I know, we do not), my assumptions
are the following:

Benefits: we could pick up different (and hopefully better) access
paths with current RUNSTATS due to DB2 maintenance or version
upgrades.

Harm: we would be unnecessarily consuming CPU cycles for executing
RUNSTATS on a tablespace that has not been updated since the last
time RUNSTATS was executed.

Am I missing anything?

What are others doing for running RUNSTATS on a regular basis?

Thanks.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration
_____

Roy Boxwell

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Bill Gallagher)
Hi!



I am a great fan of never running RUNSTATS... the risk of access path
regression after a RUNSTATS is too great – even if the data has not changed
– How is that possible? What I have seen many time is the following:



COLGROUPs are created, Indexes are created, RUNSTATS are correctly run.

An index is dropped, changed or a COLGROUP is no longer required (see
statsfeedback)

The dropping of this non-used index or non-used COLGROUP (Or even non-used
FREQVAL) removes a large piece of information that the optimizer actually
did use to decide *not* to use that particular index etc. So you happily get
changed access paths when you drop never used index or colgroups.

This “problem” is made worse by regularly run pointless runstats (I call
them ruinstats!)

If you have IOE or sampling and VARCHAR index columns it gets even worse as
your data changes every run...if you don’t believe me just do a full
runstats on a table and index with at least one IOE and extract and save the
data then run it again – you get different results – Nothing earthz
shattering but there is a tipping point for the optimizer somewhere...



Simple rule of thumb:



Only do a RUNSTATS once the amount of data has changed by
two orders of magnitude (up or down of course!)

Or

A Schema change (ALTER, CREATE INDEX etc) or a COLGROUP
change has occurred



If the data has not changed for two years then a new runstats could well do
new things as there are always updates to utilities and especially to the
optimizer, however the runstats updates occur much less often than the
optimizer. So the question is : Do you want them? If so, you must preCheck
all of your SQL by doing a fake REBIND and compare access paths as this
could well be worth it and is a lot cheaper than doing a RUNSTATS and a
REBIND.



Roy Boxwell

SOFTWARE ENGINEERING GMBH and SEGUS Inc.
-Product Development-

Heinrichstrasse 83-85
40239 Duesseldorf/Germany
Tel. +49 (0)211 96149-675
Fax +49 (0)211 96149-32
Email: <mailto:[login to unmask email]> [login to unmask email]
http://www.seg.de http://www.seg.de

Software Engineering GmbH
Amtsgericht Düsseldorf, HRB 37894
Geschäftsführung: Gerhard Schubert, Ulf Heinrich



From: Gallagher,Bill R [mailto:[login to unmask email]
Sent: Monday, September 10, 2018 6:51 PM
To: [login to unmask email]
Subject: [DB2-L] - DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables



Good afternoon,



We’re looking into enhancing the automation of our RUNSTATS processing and
utilizing the data in the RTS tables to determine if and when RUNSTATS needs
to be run for any particular tablespace.



We’re looking at three basic conditions:



1) RUNSTATS has never been run for a tablespace

2) There has been a significant amount of insert/update/delete activity
against a tablespace since the last execution of RUNSTATS

3) RUNSTATS hasn’t been run in “x” number of months



That last condition is being thrown in as a “catch-all” condition, just to
make sure that we don’t have any stale statistics (which we currently do).



A question came up about the impact of doing that. Basically, is there any
benefit to running RUNSTATS against tablespaces that contain very low levels
of or no changed data, and conversely, is there any harm in doing so?



Assuming that we have no odd circumstances which cause access paths to
misbehave when the tablespaces or indexes that they access are updated by
RUNSTATS (as far as I know, we do not), my assumptions are the following:



Benefits: we could pick up different (and hopefully better) access paths
with current RUNSTATS due to DB2 maintenance or version upgrades.



Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS
on a tablespace that has not been updated since the last time RUNSTATS was
executed.



Am I missing anything?



What are others doing for running RUNSTATS on a regular basis?



Thanks.



Bill Gallagher | Senior Systems Engineer, DBA | Data Administration

_____

This message (including any attachments) may contain confidential,
proprietary, privileged and/or private information. The information is
intended to be for the use of the individual or entity designated above. If
you are not the intended recipient of this message, please notify the sender
immediately, and delete the message and any attachments. Any disclosure,
reproduction, distribution or other use of this message or any attachments
by an individual or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201

-----End Original Message-----

Attachments

  • smime.p7s (5.1k)

Roy Boxwell

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Lockwood Lyon)
Well, in Db2 10 and above you can use TABLESAMPLE for UTS spaces that does
indeed really sample, unlike SAMPLE that does not... Clear??



Roy Boxwell

SOFTWARE ENGINEERING GMBH and SEGUS Inc.
-Product Development-

Heinrichstrasse 83-85
40239 Duesseldorf/Germany
Tel. +49 (0)211 96149-675
Fax +49 (0)211 96149-32
Email: <mailto:[login to unmask email]> [login to unmask email]
http://www.seg.de http://www.seg.de

Software Engineering GmbH
Amtsgericht Düsseldorf, HRB 37894
Geschäftsführung: Gerhard Schubert, Ulf Heinrich



From: Lock Lyon [mailto:[login to unmask email]
Sent: Tuesday, September 11, 2018 1:55 AM
To: [login to unmask email]
Subject: [DB2-L] - RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables



Bill,



IMHO, I think that automating running RUNSTATS is a waste of effort that
implements a costly procedure.



I believe that RUNSTATS gathers statistics by reading every row in the table
/ index(es). If so, it's a costly process both CPU- and I/O-wise.



Consider:

- If you have a lot of statically bound packages, RunStats doesn't
help unless you ReBind; and even then, you need to be careful that access
path(s) don't regress. Do you have a plan management scheme implemented?

- If you have a lot of dynamic SQL, don't you run the risk of
changing access paths without knowing what effects that will have ... until
someone runs a critical on-line application?

- If there has been a significant amount of Ins/Upd/Del activity
since the last execution of RunStats, ... well, if you implement your
automation then volatile tables will get RunStats run regularly. Is this
wise? Didn't execution N of RunStats measure the effects of the updates, and
won't the N+1 execution give you the same info? (Excepting, of course,
cardinality changes.) Oh, and if these tables are volatile, aren't you
running Reorgs? If so, won't you have a Stats specification as part of the
Reorg? Do you really want lots of activity on a table to trigger a RunStats,
and not a Reorg, at least for the indexes?



Last, don't forget publish/subscribe table pairs, replication pairs, and any
self-built clones. These encompass situations where a table in location A is
copied verbatim on a regular basis to location B. Example: Operational
tables thatare copied to the data warehouse, DW dimension tables that are
published to data marts, etc. These target tables may get changed/loaded
daily, yet their contents may not change!



Just some thoughts, may not be applicable to your enterprise.



- Lock Lyon





From: Gallagher,Bill R [mailto:[login to unmask email]
Sent: Monday, September 10, 2018 12:51 PM
To: [login to unmask email] <mailto:[login to unmask email]>
Subject: [DB2-L] - DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables



Good afternoon,



We’re looking into enhancing the automation of our RUNSTATS processing and
utilizing the data in the RTS tables to determine if and when RUNSTATS needs
to be run for any particular tablespace.



We’re looking at three basic conditions:



1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete activity
against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn’t been run in “x” number of months



That last condition is being thrown in as a “catch-all” condition, just to
make sure that we don’t have any stale statistics (which we currently do).



A question came up about the impact of doing that. Basically, is there any
benefit to running RUNSTATS against tablespaces that contain very low levels
of or no changed data, and conversely, is there any harm in doing so?



Assuming that we have no odd circumstances which cause access paths to
misbehave when the tablespaces or indexes that they access are updated by
RUNSTATS (as far as I know, we do not), my assumptions are the following:



Benefits: we could pick up different (and hopefully better) access paths
with current RUNSTATS due to DB2 maintenance or version upgrades.



Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS
on a tablespace that has not been updated since the last time RUNSTATS was
executed.



Am I missing anything?



What are others doing for running RUNSTATS on a regular basis?



Thanks.



Bill Gallagher | Senior Systems Engineer, DBA | Data Administration

_____



-----End Original Message-----

Attachments

  • smime.p7s (5.1k)

Bill Gallagher

DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Lockwood Lyon)
Lock,

Thanks for your comments. Your points are well taken.


1. We are typically not doing rebinds unless there is a well-defined reason to do so, such as table structure changes that have invalidated packages, or to utilize a new index that has been created in response to a known performance issue. At that point, we want somewhat current and accurate statistics in the DB2 catalog for the optimizer to do its job in an informed manner. This automated process will ensure that stats will be current and accurate at that time. Also, we are using versioning and plan management, so we do have fallback capability in the unlikely event of performance regression.
2. Our dynamic SQL seems to be very well behaved, we have no known issues in which a change in statistics has resulted in a degradation in performance. If we ever do come across that, we would handle that on an exception basis. We have a method for exception handling (i.e. excluding certain objects from the process if warranted) designed into this new automation process.
3. We are currently running REORGs on a regular basis. Perhaps too frequently, in fact. We will be looking to address more "intelligent", or conditional REORGs, in the near future. The work we are doing here with RUNSTATS will be coordinated and integrated with the work we will be doing with REORGs so that both will be done in a thoughtful and coordinated manner.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration

From: Lock Lyon <[login to unmask email]>
Sent: Monday, September 10, 2018 7:55 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables

Bill,

IMHO, I think that automating running RUNSTATS is a waste of effort that implements a costly procedure.

I believe that RUNSTATS gathers statistics by reading every row in the table / index(es). If so, it's a costly process both CPU- and I/O-wise.

Consider:

* If you have a lot of statically bound packages, RunStats doesn't help unless you ReBind; and even then, you need to be careful that access path(s) don't regress. Do you have a plan management scheme implemented?
* If you have a lot of dynamic SQL, don't you run the risk of changing access paths without knowing what effects that will have ... until someone runs a critical on-line application?
* If there has been a significant amount of Ins/Upd/Del activity since the last execution of RunStats, ... well, if you implement your automation then volatile tables will get RunStats run regularly. Is this wise? Didn't execution N of RunStats measure the effects of the updates, and won't the N+1 execution give you the same info? (Excepting, of course, cardinality changes.) Oh, and if these tables are volatile, aren't you running Reorgs? If so, won't you have a Stats specification as part of the Reorg? Do you really want lots of activity on a table to trigger a RunStats, and not a Reorg, at least for the indexes?

Last, don't forget publish/subscribe table pairs, replication pairs, and any self-built clones. These encompass situations where a table in location A is copied verbatim on a regular basis to location B. Example: Operational tables thatare copied to the data warehouse, DW dimension tables that are published to data marts, etc. These target tables may get changed/loaded daily, yet their contents may not change!

Just some thoughts, may not be applicable to your enterprise.

- Lock Lyon


From: Gallagher,Bill R [mailto:[login to unmask email]
Sent: Monday, September 10, 2018 12:51 PM
To: [login to unmask email]<mailto:[login to unmask email]>
Subject: [DB2-L] - DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables

Good afternoon,

We're looking into enhancing the automation of our RUNSTATS processing and utilizing the data in the RTS tables to determine if and when RUNSTATS needs to be run for any particular tablespace.

We're looking at three basic conditions:


1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete activity against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn't been run in "x" number of months

That last condition is being thrown in as a "catch-all" condition, just to make sure that we don't have any stale statistics (which we currently do).

A question came up about the impact of doing that. Basically, is there any benefit to running RUNSTATS against tablespaces that contain very low levels of or no changed data, and conversely, is there any harm in doing so?

Assuming that we have no odd circumstances which cause access paths to misbehave when the tablespaces or indexes that they access are updated by RUNSTATS (as far as I know, we do not), my assumptions are the following:

Benefits: we could pick up different (and hopefully better) access paths with current RUNSTATS due to DB2 maintenance or version upgrades.

Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS on a tablespace that has not been updated since the last time RUNSTATS was executed.

Am I missing anything?

What are others doing for running RUNSTATS on a regular basis?

Thanks.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration
________________________________

-----End Original Message-----
________________________________
This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201

J&#248;rn Thyssen

RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Bill Gallagher)

Hi Bill,

You are probably aware that there are vendor tools that can automate this for you.

If you go the roll-your-own route you might want to look at DSNACCOX stored procedure.

https://www.ibm.com/support/knowledgecenter/en/SSEPEK_11.0.0/sqlref/src/tpc/db2z_sp_dsnaccor.html

 

If you want to procrastinate you can consider waiting until Db2 12 and look at the "automatic" RUNSTATS 

https://www.ibm.com/support/knowledgecenter/en/SSEPEK_12.0.0/perf/src/tpc/db2z_automatestatistics.html

In Reply to Bill Gallagher:

Good afternoon,

We're looking into enhancing the automation of our RUNSTATS processing and utilizing the data in the RTS tables to determine if and when RUNSTATS needs to be run for any particular tablespace.

We're looking at three basic conditions:


1. RUNSTATS has never been run for a tablespace
2. There has been a significant amount of insert/update/delete activity against a tablespace since the last execution of RUNSTATS
3. RUNSTATS hasn't been run in "x" number of months

That last condition is being thrown in as a "catch-all" condition, just to make sure that we don't have any stale statistics (which we currently do).

A question came up about the impact of doing that. Basically, is there any benefit to running RUNSTATS against tablespaces that contain very low levels of or no changed data, and conversely, is there any harm in doing so?

Assuming that we have no odd circumstances which cause access paths to misbehave when the tablespaces or indexes that they access are updated by RUNSTATS (as far as I know, we do not), my assumptions are the following:

Benefits: we could pick up different (and hopefully better) access paths with current RUNSTATS due to DB2 maintenance or version upgrades.

Harm: we would be unnecessarily consuming CPU cycles for executing RUNSTATS on a tablespace that has not been updated since the last time RUNSTATS was executed.

Am I missing anything?

What are others doing for running RUNSTATS on a regular basis?

Thanks.

Bill Gallagher | Senior Systems Engineer, DBA | Data Administration
________________________________
This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited.

TRVDiscDefault::1201



 

Best regards,

Jørn Thyssen

Rocket Software
77 Fourth Avenue • Waltham, MA • 02451 • USA
E: [login to unmask email] • W: www.rocketsoftware.com 

2018 IBM Champion.

Views are personal. 

Michael Hannan

RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Jørn Thyssen)

Ha ha! I love a controversial topic.

Roy makes some good points about not collecting Runstats, perhaps a little exaggerated. I think the Stale Stats scare has been overplayed. If you don't have dynamic SQL, and no REOPT and you are happy with your Static access paths, you don't need to collect new Stats to be used by REBIND for possible access path change, very frequently.

Once in a while you might want to see what the access paths would be with a new set of Stats to see if some might be better, particularly improved access paths arrive with new DB2 Versions, or new indexes.

Without understanding how Stats get used well,it is hard to make the best decisions really. However lets assume that a table is staying relatively stable then the Stats collected now might be perfectly fine in 1 year from now. Realtime Stats could be used to find out how much activity has occurred and estimate how far wrong are the Stats to decide if we want new ones. 

Certain Stats used to be very important or the Optimizer could make a bad decision. You never want stats that say a table is empty, or DB2 could choose Tablespace scan, so I used to populate for new tables, Stats estimate for the size a table will grow to in the future perhaps in 1 year, as an improvement over empty Stats or default Stats.

In old days,if your Stats said a partition was empty, DB2 Optimizer might choose limited Partition scan, when actually index use is much better.This applies mainly to dynamic SQLs.  Hopefully this issue was fixed.  There is an argument for setting all tables to Volatile or setting NPTHRESH (forget exact spelling) to value to always use indexes. 

Certain Stats a lot more important than others, e.g. KEYCARD Stats were so important that they became mandatory.

For me Stats that are near enough, are good enough. Freq Value Stats can be very useful to show how skewed distributions are but have to understand when DB2 can actually use these Stats. They don't need to be updated frequently. Near enough is good enough. Colcardf again near enough is good enough, don't need to be updated frequently unless big change.

Don't collect Runstats with Reorg, it makes your cluster index always look 100% clustered which is only the reality just after the Reorg.

Don't do REBINDs without a good reason. 

For dynamic SQL optimization,use RTS to work out when changes to a table have been significant enough to get new stats,or Update someof your tables Stats using the RTS Stats if they are reliable.

A lot of sites seem to waste a lot of time doing RUNSTATS when not needed and REBIND when not needed. Then they might not know if their access paths changed or not and was the performance better or worse. You would want to SWITCH back in a hurry if performance went bad.

I would prefer spare CPU to be used to Reorg the disorganised tables, than collecting Stats so frequently.

The Optimizer improves but I still see it choosing very bad access paths on occasions. That is the risk of REBIND with new Stats, or with changing Indexes and REBIND, etc. My favourite bad access paths are anytime MC=0 on access to a table (when there is good alternative) and this could be caused by optimistic Stats, or when a joined table has no match column on the major join column. So I believe,we must detect these access path changes and back them out. 

REBIND with APREUSE should be normal option,to avoid change when we don't want it.

Reasonable Stats are needed for dynamic but many of the Stats don't have to be changed that frequently. Detect the major change that needs new Stats.

DB2 I believe tries to compensate for old HIGH2KEY values in dynamic SQL these days, but that and Histogram Stats could be important for some dynamic SQLs.

Many Filter Factors calculated by DB2 are unrepresentative especially where host variables or parameter markers are used, no matter how frequently you collect Stats, or perhaps are just not collecting the Stats that DB2 can use. 

Range predicate filtering estimates were always bad with hostvars, and even equalspredicate estimates are way off with skewed data,so you cares what your Stats are here? LOL I am considering using Predicate Selectivity Overrides widely to replace the usage of certain Runstats that have lost the plot,and don't currently result in good access path selection. Collecting up to date Runstats just can't solve the problem.It is worth having NPAGESF and CARDF being somehwhere in the right vicinity, to help decide table scan costs, and table join sequence. These could be obtained from realtime Stats without running RUNSTATS, if one wanted to keep them reasonably up to date in the SYSTABLES, however I would not want to decrease them for highly dynamic tables, for fear of avoiding indexes currently used.

Estimated growth in COLARDFs, KEYCARDs,and FULLKEYCARDFs, could be calculated from Realtime stats growth, if desired, since absolutely correct numbers are not that important.

These days one could argue RUNSTATS offload to zIIP makes it largely free, so why not collect it? Fine if one won't use it to do a REBIND (without APREUSE), risking access path change that might be better, neutral or worse, or very badly worse. Just don't collect any very expensive Stats too often.

I would like to see some useful Freq Value Stats collected for skewed columns that were never ever collected, rather than updating some stats that change very little, very often.

I do like to know the Index DATAREPEATFACTORF values from time to time as that affects the estimated cost of table scan using an index perhaps not 100% clustered (in Data getpages incurred). The values of these after Reorg are much less interesting.

When are Stats Stale? The most likely case is a Stat that has never been collected or is 3 years old, and grossly mismatching (much lower value in rows or pages) against the realtime Stats. No need to panic for stable tables.

I find a 3rd party product reporting so many Stale Stats as somewhat irritating. I just don't care for largely Static SQL applications. I only care when it upsets the access path. 

Michael Hannan,

DB2 Application Performance Specialist
CPT Global Ltd

Michael Hannan

RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Michael Hannan)

A word about Stats Feedback.

I think the concept is very good. Many sites are missing very important or useful Stats in their Catlg, and some have conflicting Stats, or even corrupt Stats (typically corrupt Stats were populated by 3rd party products or possibly home made automation).

I don't have a big issue with OLD Stats. Never cleanout Stats just because they are old. Sure can update them with better new Stats. Old Stats are better than nothing, and can be fine if the shape of the data is not changing that much. If OLD Stats show zero rows, that is a big problem unless object is marked Volatile. Zero Stats are pretty useless Stats for the Optimizer (unless guaranteed to never change).

It was always possible to write queries to identify these things. However Stats feed back is reporting Stats it could have been used for a particular access path selection. Great!

The one type of Stats feedback that is irritating me just a little, is the prevalence to report "Stale" Stats essentially determined to be out of sync based on the collect times. We cannot collect all Stats at precisely the same time. Some Stats are O.K. to be quite old on relatively static objects. We don't need exact Stats, only representative Stats that allows the Optimizer to make a good decision, and Stats to show skewed columns is important.

So when you see Stats Feedback that something is Stale, I would take that with a grain of salt. IBM is trying to be on the very safe side here.

For some Stats, very approximate values will do, since calculated Filter factors will be way off anyway. Do I care if the COLCARDF on a non key column is out of date? No big deal at all. If the value is a bit low, all the better. DB2 might choose a bit more pessimistic access path. Freq Value Stats out of date? No big deal as the top frequencies generally don't change that much over time (sure could be special exceptions)

If your CARDF or NPAGESF is far too small for a table, that is a big deal, as DB2 is going to underestimate scan costs far too much. For tables growing fast, I would like values that predict the maximum size of the object over the next year or two, and then don't change it often (if largely a Static SQL environment).

My aim is avoid DB2 Optimizer greatly underestimating the cost of an access path for frequent SQLs. That is what leads to worst performing access paths.

I like the way the Optimizer has been enhanced to do Index non-leaf sampling in special cases, to overcome misleading Stats, especially where Stats indicate zero rows in a table or partition, and the reality is different.

I like to have Frequent Value Stats for all skewed columns, at least the top 3 values (as a minimum) to show skew (and especially knowing percent Null), but they don't have to be collected regularly and don't need to be up to date, if the access paths are not using them (due to hostvars etc.). Least freq. values are never at all interesting to me for access paths, so I recommend to use MOST only.  Suppose some value is represented by a single row only. That is a low frequency (big deal), and sampling collection of Stats could miss it easily, yet another value that is not populated yet, will not have a LEAST frequent row. So I don't see the point in LEAST or BOTH unless have a very specific requirement, for a very specific value.

No need to collect multi-column Freq Values unless you have access paths that could use them. i.e. Dynamic SQL with an equals match on that number of columns (not parameter markers), or REOPTed SQLs. Stats feedback could identify which mult-column COLGROUP Freq Value stats are useful. Some sites are collecting many that don't seem to be useful.

When Stats result in very wrong Filter Factors, Stale or not Stale just does not matter much, but Predicate Selectivity Overrides are a mechanism to overcome this for your high use SQLs with non-optimal access paths, and allow the Optimizer to make a better choice (which might be one you never thought of for an OPTHINT). Sometimes we know the filtering better than the Stats will ever calculate. All part of SQL Access Path Stability for Static SQLs, and even for Dynamic in recent DB2 version. The old OPTHINTs by QUERYNO can die out. IBM have reworked it much better, and includes using REOPT, parallelism, or potential star join on specific SQLs.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Sep 20, 2018 - 07:58 AM (Europe/Berlin)

Walter Jani&#223;en

AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Michael Hannan)
Hi Michael

I do not agree with all you said. If old or stale stats have any impact on the access path decision, who will decide this in general even for a company and you cannot teat each table individually. Statistics recommendations were introduced in V11 and in V12 the new reason STALE was added. The question is: why was it added? I think it was added, because stale statistics had led to problems and even to PMRs. And I also do not agree that old statistics are better than none. I had discussions with the lab that the optimizer should be more rule based in these cases and the reaction on this discussion was that they enhanced the NPGTHRSH-rule, but in my opinion it could be enhanced even more.

I also think that stats feedback are a good thing, but the sheer amount of recommendations should be decreased. Sometimes we get several hundreds of recommendations for a single table, which has one index. Therefore I think that recommendations should only be issued, if the access path can change to the better, but not e.g. if the estimated costs could a adjusted.

I also don’t know, which impact COLCARDF of any non-key-column has on the join-sequence or the join-method.

Lastly: If you use USE PROFILE with your runstats there is no MOST, LEAST or BOTH clause, so the discussion about this is obsolete.

Terry Purcell and I will have a presentation at the IDUG in Malta, where these topics are discussed.

Kind regards
Walter Janißen

ITERGO Informationstechnologie GmbH
Anwendungsentwicklung
Technische Anwendungsarchitektur
Victoriaplatz 2
D-40198 Düsseldorf
Tel.: +49(0)211/477-2928
Fax: +49(0)211/477-6441
[login to unmask email]<mailto:[login to unmask email]>

ITERGO Informationstechnologie GmbH
Vorsitzender des Aufsichtsrats: Christian Diedrich
Geschäftsführung: Dr. Bettina Anders (Vorsitzende),
Lothar Engelke, Ina Kirchhof, Dr. Michael Regauer
Sitz: Düsseldorf, Handelsregister: Amtsgericht Düsseldorf HRB 37996

Von: Michael Hannan <[login to unmask email]>
Gesendet: Donnerstag, 20. September 2018 07:57
An: [login to unmask email]
Betreff: [DB2-L] - RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables


A word about Stats Feedback.

I think the concept is very good. Many sites are missing very important or useful Stats in their Catlg, and some have conflicting Stats, or even corrupt Stats (typically corrupt Stats were populated by 3rd party products or possibly home made automation).

I don't have a big issue with OLD Stats. Never cleanout Stats just because they are old. Sure can update them with better new Stats. Old Stats are better than nothing, and can be fine if the shape of the data is not changing that much. If OLD Stats show zero rows, that is a big problem unless object is marked Volatile. Zero Stats are pretty useless Stats for the Optimizer (unless guaranteed to never change).

It was always possible to write queries to identify these things. However Stats feed back is reporting Stats it could have been used for a particular access path selection. Great!

The one type of Stats feedback that I irritating me just a little, is the prevalence to report "Stale" Stats essentially determined to be out of sync based on the collect times. We cannot collect all Stats at precisely the same time. Some Stats are O.K. to be quite old on relatively static objects. We don't need exact Stats, only representative Stats that allows the Optimizer to make a good decision, and Stats to show skewed columns is important.

So when you see Stats Feedback that something is Stale, I would take that with a grain of salt. IBM is trying to be on the very safe side here.

For some Stats, very approximate values will do, since calculated Filter factors will be way off anyway. Do I care if the COLCARDF on a non key column is out of date? No big deal at all. If the value is a bit low, all the better. DB2 might choose a bit more pessimistic access path. Freq Value Stats out of date? No big deal as the top frequencies generally don't change that much over time (sure could be special exceptions)

If your CARDF or NPAGESF is far too small for a table, that is a big deal, as DB2 is going to underestimate scan costs far too much. For tables growing fast, I would like values that predict the maximum size of the object over the next year or two, and then don't change it often (if largely a Static SQL environment).

My aim is avoid DB2 Optimizer greatly underestimating the cost of an access path for frequent SQLs. That is what leads to worst performing access paths.

I like the way the Optimizer has been enhanced to do Index non-leaf sampling in special cases, to overcome misleading Stats, especially where Stats indicate zero rows in a table or partition, and the reality is different.

I like to have Frequent Value Stats for all skewed columns, at least the top 3 values (as a minimum) to show skew (and especially knowing percent Null), but they don't have to be collected regularly and don't need to be up to date, if the access paths are not using them (due to hostvars etc.). Least freq. values are never at all interesting to me for access paths, so I recommend to use MOST only. Suppose some value is represented by a single row only. That is a low frequency (big deal), and sampling collection of Stats could miss it easily, yet another value that is not populated yet, will not have a LEAST frequent row. So I don't see the point in LEAST or BOTH unless have a very specific requirement, for a very specific value.

No need to collect multi-column Freq Values unless you have access paths that could use them. i.e. Dynamic SQL with an equals match on that number of columns (not parameter markers), or REOPTed SQLs. Stats feedback could identify which mult-column COLGROUP Freq Value stats are useful. Some sites are collecting many that don't seem to be useful.

When Stats result in very wrong Filter Factors, Stale or not Stale just does not matter much, but Predicate Selectivity Overrides are a mechanism to overcome this for your high use SQLs with non-optimal access paths, and allow the Optimizer to make a better choice (which might be one you never thought of for an OPTHINT). Sometimes we know the filtering better than the Stats will ever calculate. All part of SQL Access Path Stability for Static SQLs, and even for Dynamic in recent DB2 version. The old OPTHINTs by QUERYNO can die out. IBM have reworked it much better, and includes using REOPT, parallelism, or potential star join on specific SQLs.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----

Joe Geller

RE: AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Walter Janißen)

Hi Walter,

You said "
I also don’t know, which impact COLCARDF of any non-key-column has on the join-sequence or the join-method."

I think this scenario would cover that.  In general it is good to filter early.  For nested loop joins, Db2 tends to prefer the smaller table as the outer table.  If you have 2 tables of equal size and they each have a predicate on a non key column, the one with the smaller filter factor would likely be chosen as the outer table.  If the predicates use a host variable the FF will be based on COLCARDF.  If tableA has a (correct) FF of .3 and tableB has a stale COLCARDF of 2 (FF=.5), Db2 would pick tableA as the outer.  But if there are now 10 values for that column in tableB, the real FF would be .1 and tableB should be the outer table.

Joe

In Reply to Walter Janißen:

Hi Michael

I do not agree with all you said. If old or stale stats have any impact on the access path decision, who will decide this in general even for a company and you cannot teat each table individually. Statistics recommendations were introduced in V11 and in V12 the new reason STALE was added. The question is: why was it added? I think it was added, because stale statistics had led to problems and even to PMRs. And I also do not agree that old statistics are better than none. I had discussions with the lab that the optimizer should be more rule based in these cases and the reaction on this discussion was that they enhanced the NPGTHRSH-rule, but in my opinion it could be enhanced even more.

I also think that stats feedback are a good thing, but the sheer amount of recommendations should be decreased. Sometimes we get several hundreds of recommendations for a single table, which has one index. Therefore I think that recommendations should only be issued, if the access path can change to the better, but not e.g. if the estimated costs could a adjusted.

I also don’t know, which impact COLCARDF of any non-key-column has on the join-sequence or the join-method.

Lastly: If you use USE PROFILE with your runstats there is no MOST, LEAST or BOTH clause, so the discussion about this is obsolete.

Terry Purcell and I will have a presentation at the IDUG in Malta, where these topics are discussed.

Kind regards
Walter Janißen

ITERGO Informationstechnologie GmbH
Anwendungsentwicklung
Technische Anwendungsarchitektur
Victoriaplatz 2
D-40198 Düsseldorf
Tel.: +49(0)211/477-2928
Fax: +49(0)211/477-6441
[login to unmask email]<mailto:[login to unmask email]>

ITERGO Informationstechnologie GmbH
Vorsitzender des Aufsichtsrats: Christian Diedrich
Geschäftsführung: Dr. Bettina Anders (Vorsitzende),
Lothar Engelke, Ina Kirchhof, Dr. Michael Regauer
Sitz: Düsseldorf, Handelsregister: Amtsgericht Düsseldorf HRB 37996

Von: Michael Hannan <[login to unmask email]>
Gesendet: Donnerstag, 20. September 2018 07:57
An: [login to unmask email]
Betreff: [DB2-L] - RE: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables


A word about Stats Feedback.

I think the concept is very good. Many sites are missing very important or useful Stats in their Catlg, and some have conflicting Stats, or even corrupt Stats (typically corrupt Stats were populated by 3rd party products or possibly home made automation).

I don't have a big issue with OLD Stats. Never cleanout Stats just because they are old. Sure can update them with better new Stats. Old Stats are better than nothing, and can be fine if the shape of the data is not changing that much. If OLD Stats show zero rows, that is a big problem unless object is marked Volatile. Zero Stats are pretty useless Stats for the Optimizer (unless guaranteed to never change).

It was always possible to write queries to identify these things. However Stats feed back is reporting Stats it could have been used for a particular access path selection. Great!

The one type of Stats feedback that I irritating me just a little, is the prevalence to report "Stale" Stats essentially determined to be out of sync based on the collect times. We cannot collect all Stats at precisely the same time. Some Stats are O.K. to be quite old on relatively static objects. We don't need exact Stats, only representative Stats that allows the Optimizer to make a good decision, and Stats to show skewed columns is important.

So when you see Stats Feedback that something is Stale, I would take that with a grain of salt. IBM is trying to be on the very safe side here.

For some Stats, very approximate values will do, since calculated Filter factors will be way off anyway. Do I care if the COLCARDF on a non key column is out of date? No big deal at all. If the value is a bit low, all the better. DB2 might choose a bit more pessimistic access path. Freq Value Stats out of date? No big deal as the top frequencies generally don't change that much over time (sure could be special exceptions)

If your CARDF or NPAGESF is far too small for a table, that is a big deal, as DB2 is going to underestimate scan costs far too much. For tables growing fast, I would like values that predict the maximum size of the object over the next year or two, and then don't change it often (if largely a Static SQL environment).

My aim is avoid DB2 Optimizer greatly underestimating the cost of an access path for frequent SQLs. That is what leads to worst performing access paths.

I like the way the Optimizer has been enhanced to do Index non-leaf sampling in special cases, to overcome misleading Stats, especially where Stats indicate zero rows in a table or partition, and the reality is different.

I like to have Frequent Value Stats for all skewed columns, at least the top 3 values (as a minimum) to show skew (and especially knowing percent Null), but they don't have to be collected regularly and don't need to be up to date, if the access paths are not using them (due to hostvars etc.). Least freq. values are never at all interesting to me for access paths, so I recommend to use MOST only. Suppose some value is represented by a single row only. That is a low frequency (big deal), and sampling collection of Stats could miss it easily, yet another value that is not populated yet, will not have a LEAST frequent row. So I don't see the point in LEAST or BOTH unless have a very specific requirement, for a very specific value.

No need to collect multi-column Freq Values unless you have access paths that could use them. i.e. Dynamic SQL with an equals match on that number of columns (not parameter markers), or REOPTed SQLs. Stats feedback could identify which mult-column COLGROUP Freq Value stats are useful. Some sites are collecting many that don't seem to be useful.

When Stats result in very wrong Filter Factors, Stale or not Stale just does not matter much, but Predicate Selectivity Overrides are a mechanism to overcome this for your high use SQLs with non-optimal access paths, and allow the Optimizer to make a better choice (which might be one you never thought of for an OPTHINT). Sometimes we know the filtering better than the Stats will ever calculate. All part of SQL Access Path Stability for Static SQLs, and even for Dynamic in recent DB2 version. The old OPTHINTs by QUERYNO can die out. IBM have reworked it much better, and includes using REOPT, parallelism, or potential star join on specific SQLs.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----

Michael Hannan

RE: AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Walter Janißen)

In Reply to Walter Janißen:

Hi Michael

I do not agree with all you said. If old or stale stats have any impact on the access path decision, who will decide this in general even for a company and you cannot teat each table individually. Statistics recommendations were introduced in V11 and in V12 the new reason STALE was added. The question is: why was it added? I think it was added, because stale statistics had led to problems and even to PMRs. And I also do not agree that old statistics are better than none. I had discussions with the lab that the optimizer should be more rule based in these cases and the reaction on this discussion was that they enhanced the NPGTHRSH-rule, but in my opinion it could be enhanced even more.

I also think that stats feedback are a good thing, but the sheer amount of recommendations should be decreased. Sometimes we get several hundreds of recommendations for a single table, which has one index. Therefore I think that recommendations should only be issued, if the access path can change to the better, but not e.g. if the estimated costs could a adjusted.

I also don’t know, which impact COLCARDF of any non-key-column has on the join-sequence or the join-method.

Most of the categories for Stats Feedback are mentioning problems that are quite obvious for completely missing Stats, conflicting Stats etc. My experience was that the overly abundant ones were of the Stale category, however I could see this could vary at different sites. I was probably looking at sites doing very complete Stats collection yet still had a lot of Stale reported.

I agree default Stats are better than Zero Stats, but otherwise it is hard to say old Stats are not better than nothing. Great examples are COLCARDF and Freq Value Stats. If COLCARDF was 5, 3 years ago, in a 100K row table, that is not bothering me much if COLCARDF is now actually 7 or 8. It won't stuff up Filter factors much, and may even improve them. If one value had 70% of the rows 1 years ago, I would hate to see that Stat just removed, even temporarily, without replacing it with more up to date Stats. That is why I say, do not clean out old Stats that are not causing any problems, if there is any way that the information could be useful (and that maybe unknown at times).  If they are very wrong or corrupt, O.K. update/replace them. If it known that the Stats cannot be possibly used by DB2 or anyone, then that is a candidate for removal. Best example would Multi column Frequent value Stats on NUMCOLUMNS=8 when there are never more than 4 columns matched in the access paths, and only 3 equals matched.

In all cases it matters, how DB2 will use the Stats in an access path determination, or will not be able to use some of the Stats.

In the distant past, before Predicate Selectivity Overrides became available, I was routinely updating COLCARDF values, for columns with range predicates, to greatly increase the filter factor (or minimise selectivity assumed). This never seemed to do a lot of harm, and helped access paths. These were not typically join columns.

Recently we had a case where a join went to the large table first with full equals unique  match on the join column (hitting exactly one row) and then joined to the small table (not a unique match), in most cases returning a not found result. Best access path is to go to the small table first, mostly experience a not found and save the join cost. Optimizer cannot estimate less than one row to be found, so only obvious solution is rewrite the SQL. Not sure if DB2 will even accept an OPTHINT in this case. So despite the COLCARDF on smaller table being much lower than outer table, the OPTIMIZER cannot deduce extreme filtering on the small table. The COLCARDF on the small table is not resulting in a sensible FF in this case. Even a Predicate Selectivity Override could not stop the Optimizer's priority to go to a Unique Key Equals matched first. A weird case, but the Stats were virtually irrelevant to the chosen access path. Almost no amount of Stats change could make a difference to get the right access path. At least we understood why the Stats values make no difference. The resulting access path was no really bad, but simply not Optimal. Any site can have quite a lot of not quite optimal access paths that don't really get noticed unless performance is quite poor, not so much for single table SQLs, but more so for complex SQLs.

For static access paths, do we want the Optimizer to keep changing join sequence in an access path, for a possible 5 or 10% improvement, due to the latest slight change in the Stats? I doubt it. New join sequence might not understand table relative clustering and join more randomly, resulting in a large degradation. Generally we want to change access paths to get big improvements, like 30 to 99%, when we are clear that new path can be much better, or ready to back out when measurement shows it not to be better. I like Static access path stability, once we are happy with them, or otherwise use REOPT or Degree ANY for expensive scanning queries, or the latest V12 Adaptive MX access paths which are like a much cheaper semi REOPT, to choose which index is best to be first used at run time.

I don't know yet how to best use Stats Profiles (due to lack of hands on experience with it), so I hope your presentation will give some insight. I am certainly reading about it, but sometimes nothing can really replace trying things. Just like most IDUG  Presentations about DB2 zOS V4 Outer Join were greatly misleading due to lack of hands on experience with using it. LOL V6 delivered the "real" Outer Join.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Sep 21, 2018 - 08:42 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 21, 2018 - 08:51 AM (Europe/Berlin)

Terry Purcell

RE: AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Michael Hannan)

I would like to point out that I have a considerable amount of agreement with regard to some of the views put forward by both Walter and Michael in this discussion around RUNSTATS feedback. This is a difficult discussion to have via email or Db2-L post - because each individual topic may require further (lengthy) back and forth discussion.

We do accept that there will be some situations where recommendations will not influence the access path. But the belief is that the vast majority of recommendations will help the optimizer improve its decision making and thus improve the stability/reliability of access path choices.

Walter has legitimately pointed out cases where valid RUNSTATS collection practices can cause STALE recommendations. At this stage we do not have enough customer feedback to see whether excessive STALE recommendations are common - and thus whether Db2 needs to improve (which could mean in the recommendations or in RUNSTATS or inline stats), or whether customers could benefit from improved RUNSTATS practices. More feedback would be beneficial - although we do risk in this thread getting insufficient detail to make an accurate determination of the reason(s).

With regard to determining whether the access path would change - without additional feedback loop(s) - this is difficult. But something we want to investigate further. And regarding the comment about the benefit of improving an access path, but less benefit to simply improving a cost estimate - we do have some customers who utilize predictive governor (RLF) to avoid query execution if the query cost is too high - and thus for those cases, there is a benefit to improving an estimate even if it does not change the access path.

This is a complex topic. And the Db2 11 & 12 enhancements are moving us in a better direction - but I accept there are areas to improve.

Regards

Terry Purcell

Walter Jani&#223;en

AW: AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Terry Purcell)
Hi Terry

Good point: “we do have some customers who utilize predictive governor (RLF) to avoid query execution if the query cost is too high - and thus for those cases, there is a benefit to improving an estimate even if it does not change the access path.”
I haven’t thought of that and will try to watch for, if there are any changes in the estimated costs, but I think we both know that these are really estimates and stay estimates.

My experience with recommendations are: if the access path is not optimal and I gathered the recommended stats, most of the time, it did not change. But I will still collect them, because who knows, if it doesn’t affect the access path choice.

Let’s see, if there will be any discussion coming up during our presentation at the IDUG in Malta.

Kind regards
Walter Janißen [standard_IBM+Champ+7+Yr+Analytics]

ITERGO Informationstechnologie GmbH
Anwendungsentwicklung
Technische Anwendungsarchitektur
Victoriaplatz 2
D-40198 Düsseldorf
[login to unmask email]<mailto:[login to unmask email]>

ITERGO Informationstechnologie GmbH
Vorsitzender des Aufsichtsrats: Christian Diedrich
Geschäftsführung: Dr. Bettina Anders (Vorsitzende),
Lothar Engelke, Ina Kirchhof, Dr. Michael Regauer
Sitz: Düsseldorf, Handelsregister: Amtsgericht Düsseldorf HRB 37996

Von: Terry Purcell <[login to unmask email]>
Gesendet: Montag, 24. September 2018 20:53
An: [login to unmask email]
Betreff: [DB2-L] - RE: AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables


I would like to point out that I have a considerable amount of agreement with regard to some of the views put forward by both Walter and Michael in this discussion around RUNSTATS feedback. This is a difficult discussion to have via email or Db2-L post - because each individual topic may require further (lengthy) back and forth discussion.

We do accept that there will be some situations where recommendations will not influence the access path. But the belief is that the vast majority of recommendations will help the optimizer improve its decision making and thus improve the stability/reliability of access path choices.

Walter has legitimately pointed out cases where valid RUNSTATS collection practices can cause STALE recommendations. At this stage we do not have enough customer feedback to see whether excessive STALE recommendations are common - and thus whether Db2 needs to improve (which could mean in the recommendations or in RUNSTATS or inline stats), or whether customers could benefit from improved RUNSTATS practices. More feedback would be beneficial - although we do risk in this thread getting insufficient detail to make an accurate determination of the reason(s).

With regard to determining whether the access path would change - without additional feedback loop(s) - this is difficult. But something we want to investigate further. And regarding the comment about the benefit of improving an access path, but less benefit to simply improving a cost estimate - we do have some customers who utilize predictive governor (RLF) to avoid query execution if the query cost is too high - and thus for those cases, there is a benefit to improving an estimate even if it does not change the access path.

This is a complex topic. And the Db2 11 & 12 enhancements are moving us in a better direction - but I accept there are areas to improve.

Regards

Terry Purcell

-----End Original Message-----
Attachments

  • image001.png (2.6k)

Michael Hannan

RE: AW: AW: DB2 z/OS v11: RUNSTATS on Static (Unchanging) Tables
(in response to Walter Janißen)

In Reply to Walter Janißen:

Good point: “we do have some customers who utilize predictive governor (RLF) to avoid query execution if the query cost is too high - and thus for those cases, there is a benefit to improving an estimate even if it does not change the access path.”
I haven’t thought of that and will try to watch for, if there are any changes in the estimated costs, but I think we both know that these are really estimates and stay estimates.

My experience with recommendations are: if the access path is not optimal and I gathered the recommended stats, most of the time, it did not change. But I will still collect them, because who knows, if it doesn’t affect the access path choice.

DB2 has a Predicative Governor because someone requested it, right? Does it do a good job? For simplest SQLs the estimated cost may not be too far out. For many SQLs the predicated cost will be misleading (slightly or hugely) as at today, due to not having all the stats possible to be collected, more so than because they are bit out of date. Stats are more important for dynamic SQL though granted. Is subquery filtering known accurately to the Optimizer for dynamic, certainly not for static.

I am doubting that keeping Stats right up to date is going to make big improvements to cost estimates. The real point is Stats need to be good enough to pick the best or reasonable access path. I may need to spend more time to investigate Stale Stats, but so far was not really interested in it (at a mainly Static SQL site).

I think absent Stats is going to make a much bigger difference, e.g. no Frequent value stats for the column where we supply the value, or no Histogram Stats. I can see old value for HIGH2KEY can upset a dynamic SQL if is used directly for range predicate filter factors. I assume DB2 now uses Non-leaf sampling in case where required range is outside the LOW2KEY to HIGH2KEY range. I may be wrong, have to check the red book again. If TABSTATS say a partition is empty I think that will trigger Optimizer to do smart recheck too rather than just scan the partition (dangerous).

I had annoying personal experience with predictive governing, with very complex queries on the site DB2 Acctng Data tables. Other than the most trivial queries, my SPUFI dialogue would give me a screen warning me I would exceed some limit set, and I had to type in the word CONTINUE for every single query I ran virtually, other than extreme trivial ones. So it wasn't set to stop me querying only put a hurdle in my path in this setup.

Is it quite likely the settings were inappropriate for me. I wonder if others have really good experiences with predictive governing. I don't see many sites wanting to use it so far. 

I do believe in Resource Limiting  to stop runaway queries going for several days LOL for a Cartesian product or mismatching datatype join. I like that resource Limiting for Static comes in too.

Predictive Governing I am not too sure about. Have not seen it enough I think.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd