July Spotlight – Db2 LUW: Types of I/O
Since the beginning of commercial applications, input/output (I/O) has been one of the main determining factors for overall performance. In the world of relational database, I/O can easily surpass all other considerations, particularly if an SQL statement asks for a very large result or a missing index causes the scan of a very large table. The majority of Db2 performance issues can be traced to one of these two causes.
I/O involves reading and writing data to disk. Older disk drives involve rotation, where the area of the disk containing your data has to reach the read/write heads on the device. Newer solid state drives (SSD) are effectively memory devices, and they are much faster than older drives. Either way, there is retrieval time involved in reading data from disk.
To speed up I/O, Db2 uses large memory areas in the computer’s main storage for a variety of purposes. These memory areas include the catalog cache, package cache, dynamic statement cache, and mostly importantly, the buffer pools for holding user data.
The proper sizing of these memory areas can greatly improve Db2 system and application performance. IBM has made memory sizing much easier through its facilities for automatic memory tuning, which handles most, but not all, aspects of tuning.
Let’s look at various types of I/O and their affect on Db2 performance.
Logical and Physical I/O
An SQL statement, in response to an application need, asks Db2 for data. This generates a logical I/O request. Db2 then retrieves the data from the buffer pool (this can be one or more buffer pools depending on how the system was defined).
If the data needed is already resident in the buffer pool, it can simply be returned to the application. However, if the data is not in the buffer pool, it is necessary to read the data from disk into the buffer pool. The operation of retrieving data from disk is called physical I/O. When physical I/O is performed it takes additional CPU and elapsed time.
All application requests for data result in logical I/O, but they do not necessarily result in physical I/O provided the data needed is already in the buffer pool. In a few cases, I have seen small databases fit completely into buffer pools such that no physical I/O is required.
While physical I/O normally occurs, there are ways to reduce it by improving the index design and increasing the sizes of buffer pools so that more data can be retained longer in memory.
Types of Physical I/O: Synchronous and Asynchronous
Synchronous reads use indexes to return result sets. The best case is where a primary index is used to retrieve a single row (the Where clause contains PK =). If the row is not in the buffer pool, Db2 can read the index to find the precise location on disk. Synchronous I/O can be used to read multiple rows in the same fashion.
In other situations, where the result set is large or there is no useful index, Db2 might be required to perform a great deal more I/O. The worst case is when it is necessary to read an entire table. This second type is called asynchronous I/O, and it is used where the result set is large or where no better access path exists. Scans are also important in batch processing where entire tables are processed. Asynchronous I/O is considered an inhibitor to good performance especially for online applications.
In the earliest days of Db2, there were two choices: provide indexes for simple SQL statements or risk a table scan. A table scan would read each table page individually and the time required was often disastrous to good performance.
The solution to scans was to introduce a read ahead capability called prefetch. This allowed Db2 to avoid I/O wait time by bringing pages into the buffer pool before Db2 would need them. The prefetch of data and indexes made the worst case scenario perform much better.
Prefetch today allows for up to 256 pages to be read in a single I/O, although 64 or 128 pages are more common and provide good performance. IBM found other ways to improve prefetch such that Db2 can use list prefetch, dynamic prefetch (now further improved and called smart prefetching) or the original sequential prefetch.
The next item to discuss in how synchronous and asynchronous I/O work with each other in a buffer pool.
Physical I/O Co-existence
For synchronous I/O to work best, the data most used should remain in the buffer pool to reduce the amount of physical I/O. A larger buffer pool allows more pages to remain resident. The result can be a large number of transactions providing high performance, particularly where the same data is used repeatedly. The same page can read and updated many times without having to be read again from disk.
For asynchronous I/O, the situation is quite different. Here we have a larger number of pages read per I/O, followed by a larger amount of CPU time being used to locate the rows of interest once they are returned to the application. In comparison to synchronous I/O, where the rows returned are quite specific and valuable to the application, the rows sent to the application via asynchronous I/O are not well qualified, and further qualification is done via SQL or the application. Based on the number of pages read, the chances of the result being kept in the buffer pool are quite low, as are the chances of the result being used again quickly by another SQL request.
Another aspect of asynchronous I/O are the number of pages used for each I/O and the number of I/Os performed. Prefetch operations cause older pages to be overwritten. This means that pages used for frequent synchronous operations may have to be read again from disk using additional CPU time and causing waits for I/O,
Solutions across the IBM RDBMS family
IBM has used three different development teams to build Db2 for LUW, Informix, and Db2 for z/OS. Each organization found its own solution for buffer pool I/O and a brief description is given here.
Db2 for LUW
To avoid the negative impact of asynchronous I/O, it is possible to define a block area within buffer pools. The block area is used for asynchronous I/O, thereby separating the two types of I/O from each other and eliminating any negative impact.
Defining a block area is a manual operation, and it is not supported by automatic buffer pool tuning. However, measured results show that block-based buffer pools are worth the trouble to improve performance.
Whenever a sequential scan of large tables is necessary, Informix bypasses the buffer pool and uses a light scan. This happens automatically and avoids any negative impact of sequential I/O on the residency of pages in the buffer pool.
This great approach is like the block based buffer pools in Db2 for LUW, but it is automatically defined and in place.
Db2 for z/OS
Db2 for z/OS provides no direct support to separate synchronous from asynchronous I/O. There is a threshold within the buffer pool to control how much of the pool can be used for sequential operations.
Tuning methodologies use detailed measurement data to identify the percentages of synchronous and asynchronous I/O for each table and index object and they are separated into separate buffer pools.
Understanding the types of I/O is important background information for every tuning effort. A large amount of asynchronous I/O is a strong indicator of the need for further index tuning.
About Martin Hubel
I am an IBM Champion and Gold Consultant. My Db2 industry activities include being the host of the Db2Night Show, a member of the IDUG Volunteer Hall of Fame, an active IDUG speaker of over 50 presentations over the past 30 years, a panelist, an author, and an executive committee member of the Central Canada Db2-IMS Users Group.