[DB2 Z/OS] Bufferpool clusters

Mike Holmans

[DB2 Z/OS] Bufferpool clusters
OK, so I use my Bufferpool Tool and I come up with an arrangement of objects in various pools, using the helpful groupings provided by the cluster analysis.

And lo and behold, the I/O rates drop and the CPU comes down and everyone is happy.

My only problem is that cluster analysis is just so much mumbo-jumbo to me. I don't really understand why it works.

I see the point of moving all those small indexes and reference tables to a separate pool since they might as well be resident and not compete for pages, but I don't think I really understand why grouping larger objects on the basis of having similar working set sizes should be helpful.

Why is it better to have one pool with a bunch of objects competing for their 5000 page average working sets and another for similarly-accessed but larger objects which want to compete for 40,000 page working sets? Isn't DB2's bufferpool management good enough to cope with everyone getting a fair slice of a combined pool? Or am I completely missing the point of cluster analysis?

I'm not criticising or being sceptical here. I merely want to understand the theory behind what obviously works in practice.

Mike Holmans
BT OneIT Operational Integrity
[login to unmask email]

Hugh Lapham

Re: [DB2 Z/OS] Bufferpool clusters
(in response to Mike Holmans)
One quick and easy answer . . .
Do they have one or more express lines at the grocery store?Do they
have a dedicated business teller or two at the bank?Similarly with
traffic lanes .... separate lanes for trucks / cars / buses / bicyclesIf
you want something more scientific, I'm sure there are several people on
the list who could write (and have written) books on the subject ;-))

>>> <[login to unmask email]> 2006-11-24 08:46:46 >>>
OK, so I use my Bufferpool Tool and I come up with an arrangement of
objects in various pools, using the helpful groupings provided by the
cluster analysis.

And lo and behold, the I/O rates drop and the CPU comes down and
everyone is happy.

My only problem is that cluster analysis is just so much mumbo-jumbo to
me. I don't really understand why it works.

I see the point of moving all those small indexes and reference tables
to a separate pool since they might as well be resident and not compete
for pages, but I don't think I really understand why grouping larger
objects on the basis of having similar working set sizes should be
helpful.

Why is it better to have one pool with a bunch of objects competing for
their 5000 page average working sets and another for similarly-accessed
but larger objects which want to compete for 40,000 page working sets?
Isn't DB2's bufferpool management good enough to cope with everyone
getting a fair slice of a combined pool? Or am I completely missing the
point of cluster analysis?

I'm not criticising or being sceptical here. I merely want to
understand the theory behind what obviously works in practice.

Mike Holmans
BT OneIT Operational Integrity
[login to unmask email]



---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Joel Goldstein

Re: [DB2 Z/OS] Bufferpool clusters
(in response to Hugh Lapham)
Hi Mike,

I wish there was a simple one or two liner that would make everything crystal clear, but it's a bit more complex
than that. So - let's give it shot and see how it sits with you, and we can go around a few more times if necessary.
Just for other readers edification here, Mike's question is directed specifically about the Buffer Pool Tool cluster
analysis feature of simulation/prediction results.

Taking one of your last questions first,
"Isn't DB2's bufferpool management good enough to cope with everyone getting a fair slice of a combined pool?"
Obviously it is not, or there wouldn't be any need, or gain, from splitting objects into different pools. It would be far too complex,
and too much overhead for DB2 to continually manage different sets of objects in a different manner within the same pool, aside from
the current random/sequential basis and LRU queues. Sampling, at specific and defined intervals has no statistical validity,
and violates National Bureau of Standards techniques for statistical sampling.

The purpose of the Cluster Analysis function of Buffer Pool Tool, is to show you which objects, based upon access method
(random vs. SP) logically group together based on the working set size (no relationship to catalog statistics).
Ultimately there are two groups to consider (with random, and SP). The BIG objects, and the rest.
There is no rule of thumb for big/large - it's just much larger than most of the others. Every system is different.

Now, after that, the next important perspective is wkset growth at larger pool sizes. As example, we have a pool with 40,000
buffers. Object A , at some point, has a wkset of 30,000 buffers. What happens to overall pool performance, and this
object specifically, if we double the pool to 80,000 buffers? Let's say the max wkset grows to 60,000 buffers.
It continues to monopolize the pool and consume (at some or various points) 75% of the pool buffers.

I would expect the important system performance metric IO rate/sec to decrease. I would expect (not always true, but...) that this
object would not see a large decrease for it's IO rate. BPT graphics will easily show you which objects achieved the greatest reduction
of IO rate. This very large random object is monopolizing the pool resources, and hurting the performance of the remaining objects
that have smaller wksets, and could achieve a better residency rate and avoid IOs, if their pages weren't being throw off the lru queue
by the large random object A.

So we move object A out of the pool in simulation mode. We can see how the remaining objects perform at varying pool sizes, and
we can see how many buffers object A needs to achieve decent performance. Every year I see more objects, at more
client sites, that are very large and very random, and continue to have high IO rates with 100,000 buffers and more.
Some will almost always cause an IO - and may not need to have tens of thousands of buffers used for them. If they are going
to need an IO most of the time - so be it. Use the memory someplace else where you can get more benefit.
Simulations do not impact the online system, and predict the effect of changes. This avoids mistakes with your production system.
Keep in mind that hit ratios are not use performance metrics. The IO rate/sec is the only real measure of performance because
it can be converted into cpu seconds, and application elapsed time delay/costs.

The same approach applies to large SP objects (if they really should be SP, and this access is not the result of bad sql, or poorly
designed indices). If an object will scanned all/most of the time, it doesn't need a lot of memory.

One of the greatest mis-conceptions today, based upon 64bit architectures, and gigabytes of available memory, is
that pool tuning isn't necessary. At least once a week we encounter somebody that thinks they can get good performance
by just throwing memory at the existing system buffer pools. When the pools are undersized to start with, anything will help.
However, it has been proven dozens of times over the past years, that oversizing pools does not improve performance, it just
wastes memory. Except for a few rare cases where the access to all objects is mostly random, and no individual objects
dominate pool resources - multiple pools and grouping objects by random/sequential access, and then by large wkset vs. the rest
(RAMOS, SAMOS) is the proven technique to achieve good, and better performance.

We have clients that have reduced their IO workload by several thousand per second using this approach.

This is a rather high level overview concept of pool tuning, and using wkset and Cluster Analysis from Buffer Pool Tool
for tuning.

So, I hope this clears up some of your questions, and I'll be happy to discuss this further if necessary.

Thanks,
Joel

Joel Goldstein
Responsive Systems
Buffer Pool Tool for DB2, the worldwide industry standard
Performance software that works......
Predicts Group Buffer Pool performance too!
www.responsivesystems.com
(732) 972-1261
----- Original Message -----
From: <[login to unmask email]>
Newsgroups: bit.listserv.db2-l
To: <[login to unmask email]>
Sent: Friday, November 24, 2006 8:46 AM
Subject: [DB2-L] [DB2 Z/OS] Bufferpool clusters


> OK, so I use my Bufferpool Tool and I come up with an arrangement of objects in various pools, using the helpful groupings provided by the cluster analysis.
>
> And lo and behold, the I/O rates drop and the CPU comes down and everyone is happy.
>
> My only problem is that cluster analysis is just so much mumbo-jumbo to me. I don't really understand why it works.
>
> I see the point of moving all those small indexes and reference tables to a separate pool since they might as well be resident and not compete for pages, but I don't think I really understand why grouping larger objects on the basis of having similar working set sizes should be helpful.
>
> Why is it better to have one pool with a bunch of objects competing for their 5000 page average working sets and another for similarly-accessed but larger objects which want to compete for 40,000 page working sets? Isn't DB2's bufferpool management good enough to cope with everyone getting a fair slice of a combined pool? Or am I completely missing the point of cluster analysis?
>
> I'm not criticising or being sceptical here. I merely want to understand the theory behind what obviously works in practice.
>
> Mike Holmans
> BT OneIT Operational Integrity
> [login to unmask email]
>
>

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Mike Bell

Re: [DB2 Z/OS] Bufferpool clusters
(in response to Joel Goldstein)
Maybe Joel will reply also but there is a fair amount of mathmatics involved
here.

The non-math answer is you try to keep the favorite mice away from the
elephants. When a program hitting a big table gets active, you want it to
compete with other tables that have similar access requirements. The second
target is to keep pages in the buffer pool long enough to reuse them. You
choose the tables to group together to reduce the probability of flushing a
page from the buffer pool a few milliseconds before another transaction
needs that page.

The math is related to the stuff I learned as Queueing theory (long time
ago). Every tablespace and indexspace has a frequency of reference which is
what you group by into bufferpools. Big active tables tend to generate a
lot of random page hits that can push other pages out of the bufferpool.
Indexes on big active tables tend to have lots of reuse especially the top
levels.

If you want a better explanation, go to Joel Goldstein's web site
http://www.responsivesystems.com/
He has articles posted there, that are much more detailed than a 3 paragraph
email.

Mike
HLS Technologies

-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf
Of [login to unmask email]
Sent: Friday, November 24, 2006 7:47 AM
To: [login to unmask email]
Subject: [DB2-L] [DB2 Z/OS] Bufferpool clusters

OK, so I use my Bufferpool Tool and I come up with an arrangement of objects
in various pools, using the helpful groupings provided by the cluster
analysis.

And lo and behold, the I/O rates drop and the CPU comes down and everyone is
happy.

My only problem is that cluster analysis is just so much mumbo-jumbo to me.
I don't really understand why it works.

I see the point of moving all those small indexes and reference tables to a
separate pool since they might as well be resident and not compete for
pages, but I don't think I really understand why grouping larger objects on
the basis of having similar working set sizes should be helpful.

Why is it better to have one pool with a bunch of objects competing for
their 5000 page average working sets and another for similarly-accessed but
larger objects which want to compete for 40,000 page working sets? Isn't
DB2's bufferpool management good enough to cope with everyone getting a fair
slice of a combined pool? Or am I completely missing the point of cluster
analysis?

I'm not criticising or being sceptical here. I merely want to understand the
theory behind what obviously works in practice.

Mike Holmans
BT OneIT Operational Integrity
[login to unmask email]


---
Incoming mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.510 / Virus Database: 307 - Release Date: 8/14/2003


---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Mike Holmans

Re: [DB2 Z/OS] Bufferpool clusters
(in response to Mike Bell)
Hi Joel,

Thanks for taking the cue.

That reply has cleared up quite a few things for me, but there's one bit
which is still a bit murky to me.

You mention systems with almost wholly random access as being slightly
weird: naturally, I'm currently tuning a system where that appears to be
the case. And that was what I was driving at with the combined pool
question: if all the access is random (but includes dynamic prefetch),
why wouldn't it be sensible to put all the indexes into one big pool?
And if it wouldn't, why would you do it by working set size rather than
by, say, putting the objects in a list from most frequently used to
least, allocating three pools and putting 1, 4, 7, 10 ... from your
ranked list into pool A, 2,5,8,11... into B, 3,6,19,12... into C? (I
don't discount the possibility that that question is partly nonsensical
because "working set size" and "use frequency" are actually different
ways of naming what is essentially the same thing.)


Mike

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm