Learning to love DB2 histograms...

One of the more interesting parts (to me :) of the new monitoring information introduced along with the enhanced workload management capabilities of DB2 for LUW 9.5 is the histogram concept. This beastie provides useful information about the distribution of values for different characteristics of things going on within DB2.

Histograms within DB2 are used to contain aggregate information about the value population for a given characteristic. In other words,they give you a high-level view of how the indivdual values spread themselves out across the range of possible values. You lose the ability to see the actual individual value itsefl but you gain perspective on the distribution of values across the entire population.

histogram.jpg

Being aggregates, histogram data collection is very light-weight allowing it to be kept active all the time with very little overhead, assuming that the underlying metric is active as you need the individual values in order to find where it belongs in the distribution! Histogram data is available to be collected at different levels (such as the database, workload, or service class) and you can collect it for all things or, by using the work action set, you can collect it only for a specific subset of activities.

The original set of aggregate information in DB2 9.5 was brought in to enhance our understanding of different characteristics of activities (e.g. SQL and Load) as they come through DB2 and this collection is managed by the AGGREGATE ACTIVITY DATA set of controls. The relevant histograms are:

  • Coordinator activity lifetime (CoordActLifetime) histogram
  • Coordinator activity execution time (CoordActExecTime) histogram
  • Coordinator activity queue time (CoordActQueueTime) histogram
  • Coordinator activity estimated cost (CoordActEstCost) histogram
  • Coordinator activity inter-arrival time (CoordActInterArrivalTime) histogram

As an FYI, we also introduced one other histogram in DB2 9.5 called "ReqExecTime" (controlled by the AGGREGATE REQUEST DATA control) which is intended for very low-level performance analysis of internal DB2 request processing which, while it may (sometimes) fascinate us DB2 developers, has little relevance to real-world usage of DB2 and this is why we never talk about this one although it is documented.

Since DB2 9.5, we have introduced another histogram, UowLifetime, which gathers aggregate information about the elapsed time for the different units of work (UOW), also referred to as transactions, processed by DB2 where processed means the UOW was ended by being committed or rolled back.

Here is the DB2 10.5 documentation link to see some details on the available histograms: histogram_type - Histogram type monitor element

The way that histogram data is harvested from DB2 today is using the Statistics event monitor and the histogram data is populated in the table with the name format of HISTOGRAMBIN_<your event monitor name>. You can learn more about the information put out by the event monitor here.

Given their light-weight nature and the desire for this style of population distribution information when making higher-level decisions, I would not be surprised to see us (DB2) continue to add other uses of the histogram within DB2 LUW as we go forward so I think it is worthwhile that all DB2 users and friends learn at least a little bit about them.

I hope you did today :)

 

A few bonus words on Histogram Templates

If one looks deeper into the DB2 histogram, you will see that all of them are based on the HISTOGRAM TEMPLATE concept which is a standardized 41 "bucket", called a bin, approach to collecting distribution information. The first 40 bins represent a range of values up to a defined maximum with each bin representing a specific subset; the first bin covers the 0-1 range and each subsequent bin represents an increasing range based on a logarimthic scale.

The 41st bin represents any value higher than the highest value defined for the template (i.e. values bigger than the range represented by the 40th bin).

Typically, one doesn't need to adjust or create a histogram template unless the majority of the values are in the upper bins and you want to get better granularity of data but it is nice to know that you can adjust things if you need to!

 

MY TOP HINT:

Remember that the histogram scale is logarithmic! It is very easy to be looking at a bar graph of the bin counts and forget that each bin to the right represents a larger range of values than the one to the left :)

 

MY TOP RESOURCE:

Keith McDonald wrote a very good set of documentation on the DB2 Workload Management Histograms when they were introduced in DB2 9.5 which I recommend as a good starting point for learning about histograms.

Part 1: A gentle introduction to histograms

http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/dm-0810mcdonald/dm-0810mcdonald-pdf.pdf

Part 2: Understanding the six histograms of DB2 workload management

http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/dm-0810mcdonald2/dm-0810mcdonald2-pdf.pdf

Part 3: Visualizing and deriving statistics from DB2 histograms using SQL

http://download.boulder.ibm.com/ibmdl/pub/software/dw/dm/db2/dm-0810mcdonald3/dm-0810mcdonald3-pdf.pdf

 

0 Comments
Recent Stories
Verson 3.0.4 of IBM Graphical WLM tool now available in developerWorks

A quick summary of available Db2 controls for system resources

Managing resource consumption for multiple databases under the same DB2 instance