Average synchronious I/O Wait in your company?

Larry Kintisch

Average synchronious I/O Wait in your company?
Please see response below.

At 07:39 AM 12/13/01 +0100, you wrote:
>How much does a synchronious I/O Wait cost?
>This question is beyond bufferpools and index optimisation!
>I know, it depends. It depends on hardware, on workload, on cache and so on.
>There is an old rule of thumb: 20 milliseconds.
>Maybe this rule is already 20 years old and still working.
>Recently we had a discussion with our system folk.
>We use ESS and he told me, with cache it takes 3 milliseconds
>and without it takes 30 milliseconds.
>With a cache hit ratio of 50% it takes us 17 milliseconds.
>Some analysis of accounting reports (SYNCH I/O AVG.)
>showed me numbers like 16, 17 milliseconds.
>Still, I think, it's too much and I would like to get feedback from you.
>What is your average synchronious I/O wait in your company?
>How do you measure it?
>And are you happy with it?
>Kind regards
>Hans-Ulrich Blumer
>Informatik Schweiz
>Software Entwickler-Support
>Mailto:[login to unmask email]
>Tel +41 52 26124 16
>Fax +41 52 261 83 70
>Winterthur Versicherungen
>Konradstrasse 14
>CH-8401 Winterthur

I passed this question on to Tapio Lahdenmaeki, who is an IBM Finland
systems engineer, DB2 expert and author of a few DB2 performance courses
for IBM Learning Services. [See his note about the new Systems Performance
course CG88.] He hasked me to pass this reply on to the list and forward
to him any replies. Larry Kintisch

----Tapio writes:
Let me start with a recent example from a large customer using ESS with
the new 36 GB drives (10,000 RPM):

DB2PM Accounting Report shows 5 ms for average synchronous read (4.1 x 5 ms
= 21 ms per average transaction)
The total database buffer pool in each datasharing member is about 1.5 GB
and the total disk cache size is 24 GB.
I do not know the DB read cache hit ratio, but I guess it could be
something like 60%.

RMF reports 2.5 ms in AVG RESP TIME but that, of course, contains lots of
write hits and other cache-friendly stuff.

Then, some ESS numbers:

The average seek time is 5 ms and half a rotation takes 3 ms (10,000 RPM)
so the drive service time is 8 ms. I'm not sure about the pool-cache
time but some recent benchmarks show numbers around 1 ms. Assuming that,
the minimum time for a random read from disk (drive) is 9 ms.
The RMF report of the customer showed many volumes with AVG RESP TIME
around 1 ms, the lowest is 0.7 ms.

With PAV, IOSQ time should be 0.0. This is true in the customer report.
They had 4 PAVs (three alias addresses for each logical volume)

The most variable component is, of course, disk drive queuing time. I
believe the old queuing formula Q = (u / (1-u)) x S is still useful.
Q is the average queuing time, u is drive utilization (drive busy) and S is
the average service time.

For random-only read-only S is 8 ms but RAID-5 writes and sequential
staging have longer service times.

Let's assume S = 10 ms and u = 0.25. Then, Q = 0.25 / (1-0.25) x 10 ms = 3

The total time for a random read from disk drive is then

S + Q + transfer = 8 ms + 3 ms + 1 ms = 12 ms

If the DB cache read hit ratio is 60%, the average time for a synchronous
read is 0.6 x 1 ms + 0.4 x 12 ms = 5 ms

More details in the new IBM course CG881 DB2 UDB for z/OS and OS/390
System Performance Analysis
First U.S. class March 5-8, New York
Audience: DB2 specialists
Instructors: Bernhard Baumgartner and Tapio Lahdenmaki (the authors)
Duration: 3.5 days (Tuesday morning to Friday 1:00 PM)

Best Regards, Tapio

Tapio Lahdenmäki

[login to unmask email]
Notes Mail: Tapio [login to unmask email]
Tel: +358 9 4594536

Larry Kintisch, Pres. e-mail: [login to unmask email]
Able Information Services phone: (845)-353-3809
"DB2, QMF and Data Modeling"
208 Hilltop Drive PO Box 809
Nyack NY 10960-0809

Joel Goldstein

Re: Average synchronious I/O Wait in your company?
(in response to Larry Kintisch)
Performance in the field varies dramatically, and ESS is no exception.
Very few installations are seeing anything like 5 ms avg response time for
of their logical devices - one or two select ones, maybe..

A cache miss on these devices still provides worse performance than a well
running 3390,
and this is true on all the RAID type boxes..
You are getting data from more than one spinning disk, so access time to
(only) one device does
not really apply... you have the "opportunity" for rps misses on multiple
spinning disks.

Some datasets will have a good cache hit rate, and provide good response
time. Those
that are very large and very random will never provide good performance
because there will almost always
be a miss in the VPs, miss in the GBPs, and a miss in the dasd cache.

There is usually a vast difference between controlled laboratory tests, and
the varied workloads in
"large" production environments. The larger the production system, the
more problems you still find.....

PAV provides huge relief from the pre-existing queueing delays when it is
implemented correctly;
however, dasd performance problems still prevail in every large
installation I see.
It's a huge advance, a major improvement, and still not the silver bullet
the hardware vendors would
like you to believe it is...

You can still kill a Shark.....

Whatever performance your friend across the street is seeing..... yours
will be different.