Reducing Bursts of Writes in DB2 for z/OS

Over the years DB2 buffer pools have dramatically increased in size to support ever increasing DB2 workloads. It is common today for DB2 installations to have multiple DB2 subsystems/members, each with several large buffer pools. In many cases, an installation will have buffer pools containing many objects which have significant insert/update/delete activity; and, it’s a good bet this configuration is not limited to just production environments. Sooner or later, all the updated pages from all the objects, in all the buffer pools in all the subsystems have to be externalized (or written) to disk. As a result, there can be large numbers of updated pages, from many subsystems’ buffer pools, being written to disk at one time. This burst of (write) I/O activity can cause headaches for the I/O (storage) subsystem which can lead to overall system performance problems impacting more than just DB2 workloads. Let’s look in more detail how DB2 can cause these surges of I/Os; and how we can reduce these I/O bursts and their impact to the system. 

Certain DB2 events can cause bursts of write I/Os to externalize updated buffer pool pages including a stop of DB2, DB2 system checkpoints, or a stop of a database, table space, or index space. A stop of DB2 and DB2 system checkpoints have the largest scope (meaning many objects across many buffer pools are affected) and depending on the page update activity across the objects in all buffer pools, there can be significant numbers of updated pages to be externalized. A stop of a database will only require updated pages of objects in the specified database to be externalized. And, even smaller in scope, a stop of a table space or index space will only require updated pages for the table space & index space datasets (partitions) to be written to disk. Keep in mind the number of updated pages for just a single table space can be significant since a table space can have many parts (datasets). It shouldn’t’t be a surprise that these events can cause bursts of write I/Os from DB2.

But what happens between DB2 & object stops, and DB2 system checkpoints?  Not only are updated pages written to disk because of one of the before mentioned events, updated pages in DB2 buffer pools also need to be written to disk periodically. DB2 buffer pools have write thresholds which determine when DB2 will externalize updated buffer pool pages. The IBM DB2 for z/OS manuals are a good source for learning about the different thresholds and how to configure them (DB2 Managing Performance, DB2 Command Reference, DB2 9 for z/OS: Buffer Pool Monitoring and Tuning). Our discussion will focus on the VDWQT (Vertical Deferred Write Queue Threshold) and how we can use it to control write I/O activity at the buffer pool level. In addition, we’ll illustrate the results of how different VDWTQ settings impact write I/O quantities & frequency. 

What is VDWQT?  

Vertical Deferred Write Queue Threshold is the number of updated pages allowed in a buffer pool per dataset before DB2 initiates writing them to disk. There are two types of values to use when setting VDWQT – a percentage of updated pages in the buffer pool per dataset (VDWQT=5,0), or an exact number of updated pages per dataset (VDWQT=0,128). Each setting type has its pros and cons. Which type of setting to use (percentage of pages, or specific number of pages) depends on the page update (insert, update, delete) activity in the buffer pool, and what type of deferred write behavior you want. 

Buffer pools with low or moderate volume and frequent page re-references will likely function well when the VDWQT is specified as a percentage. We want to keep the updated pages in the pool without writing them because we expect to update them again. This reduces the overall number of write operations. 

Buffer pools with high volume, sequentially inserted objects with little page re-reference, should have a low VDWQT to ‘bleed’ or ‘trickle’ the writes of the updated pages. By contrast, buffer pools with objects that have little update activity and high page re-reference should have a high VDWQT setting. This allows pages to stay in the buffer pool longer, achieving high hit ratios and less read I/Os. The ‘bleeding’ or ‘trickling’ of deferred writes reduces bursts of write I/O activity which makes the write traffic more manageable and efficient for the I/O subsystem.

What happens when VDWQT is reached? 

For each dataset (a pageset can have multiple partitions – each partition is a dataset) that triggers the VDWQT, the least recently used updated pages are queued to be written to disk. Specifically, DB2 will sort 32 to 128 of the least recently used pages (by page number) and place them on the deferred write queue to be written to disk. This is the normal process for each dataset that hits the VDWQT setting in a non-Data Sharing environment. In Data Sharing environments, if a dataset has inter-DB2 Read/Write interest, its updated buffer pool pages are written to the Group Buffer Pool at commit. Updated Group Buffer Pool pages are subject to castout processing which is the group buffer pool equivalent to VDWQT processing.

Externalizing updated buffer pool pages in least recently used order keeps more frequently used pages (like space map pages) in the buffer pool longer to help fulfill page re-reference requests faster. To write 32 pages, DB2 will schedule 1 write operation. Page latches are acquired for each of the 32 pages and held until the write is completed (one write operation latches 32 pages until it completes to the disk controller’s cache). Latching only 32 pages at a time helps reduce page latch times which could otherwise elongate in-DB2 times for other DB2 workloads (units of work) waiting to read those pages. Furthermore, if the page range of the updated 32 pages are within 180 pages of each other (a cylinder boundary), DB2 will write them in the same I/O. In other words, let’s say we have the following random (32) pages on the deferred write queue — page 1, pages 50-54, page 170, pages 190, and pages 200-224. Even though there are only 32 pages on the queue and would normally require 1 write operation, DB2 will schedule 2 writes – one for pages 1, 50-54, 170, and a second for pages 190, 200-224. Buffer pool statistics will show the number of pages written and the number of write I/O operations in a given interval. A poor ratio of buffer pool pages written per write I/O results in excessive inefficient write I/O activity.

Example of how to reduce bursts of buffer pool writes.

Let’s say we have a 500,000 page buffer pool with 1,000 datasets (pageset objects) assigned to this buffer pool. For illustration purposes, let’s also say the update activity of 250 of these datasets have high numbers of updated pages between system checkpoints with enough update activity to hit the VDWQT setting for all 250 datasets. 

If a percentage value was used for VDWQT – say 5% the default (VDWQT=5,0), then you are allowing all datasets in this buffer pool to have 25,000 updated pages before DB2 starts writing the pages to disk. That means we have 250 datasets hitting the threshold so we could have (25,000 x 250) 6,250,000 pages with pending writes before we hit VDWQT for any of the datasets. That obviously won’t fit in the 500,000 page buffer pool which means we are likely to hit the Deferred Write Threshold (DWQT) with a need to write out a lot of pages. That is certainly not what we want.

When we do hit the 5% VDWQT for a dataset, we’d have 25,000 pages that need to be externalized for each dataset that hit that threshold. DB2 is going to schedule 128 of those pages to be written. Meanwhile, the number of unwritten pages will continue to count toward the DWQT. That’s still a significant number of unwritten pages and with our set of high activity datasets we’ll likely continue to hit the VDWQT for multiple datasets. Eventually, we’ll hit DWQT.

When DWQT is reached in our example, DB2 will write out 128 pages from enough datasets to bring the number of updated pages down to 10% below DWQT. That could be all 250 of our very active datasets being written at once (32,000 pages) plus whatever else can be identified for a total of 50,000 pages. The total number of I/O operations would be at least 1,562. All of these write operations keep the disk channels and devices busy, effectively queuing up read operations. Your applications might see this as random periodic slowdowns with no apparent cause.

By contrast, if you use a specific value for VDWQT of 128 (VDWQT=0,128), you are allowing each dataset to have only 128 updated pages before DB2 starts writing them to disk. With our 250 very active datasets, that leaves us with up to 32,000 (128 x 250) updated pages in the buffer pool before we start externalizing them. This is far below the normal DWQT of 30% so we don’t end up writing all of the datasets at the same time. With each dataset hitting its 128 page limit at random intervals, we’d get a lot of smaller write I/O operations, essentially trickling the data out to disk rather than writing in large bursts.


Setting of VDWQT correctly can have a significant impact on write activity generated by all your DB2s and their buffer pools across all your environments. Setting VDWQT to a specific number (of updated pages) per dataset allows DB2 to ‘bleed’ or ‘trickle’ deferred writes when that number of update pages is reached. It also reduces the number of updated pages to be externalized between DB2 & object stops, and DB2 system checkpoints. Furthermore, I/O subsystems can better manage steady streams of smaller write requests than large bursts of writes. 


1 Comment

Be Careful with Trickle Writes

May 28, 2015 06:29 AM by William Miller

Your observation of the VDWQT setting is correct.  However, I believe you have not taken the cost of IO into consideration.  DB2 engages with zOS to initiate the write.  While inexpensive, this does carry some cost.  If you have a really low write threshhold, you will hit the threshold extremely often.  I have seen customers where the threshold was hit over 1000 times in a second, causing excessive usage in the system but not in DB2.

I think that hitting a write threshold every tenth of a second is probably a good balance between security and speed. 

The customer that adjusted their write threshold upwards achieved savings of over 100 MIPS.  Of course, your milage may vary.


-- Bill

Recent Stories
Application Development Using node.js by Bimal Jha

Db2 for z/OS Native REST API

A RESTful Introduction