SQL tuning is not just looking at an SQL statement and EXPLAIN information, but is also about verifying whether or not a tuning change, be it a change to a statement or change to a database object, actually has a positive impact on performance. Remember that SQL tuning is about saving elapsed time and/or CPU resources, and there is only one way to really do that: benchmarking!
So, You’ve Changed an SQL Statement for Better Performance
Be careful when tuning SQL via EXPLAIN analysis. A better looking access path, or even an SQL statement that shows a lower cost after tuning, is not always an indicator of better performance. There are factors that influence the statement cost as predicted as well as the actual cost of the executing statement. This can be missing or inaccurate catalog statistics, data skew not reflected in catalog statistics, how the statement is actually used in an application, and the inputs to the SQL statement. EXPLAIN cannot interpret any of this missing information and so cannot predict optimal performance in all situations. If you are relying solely on access path analysis, or even worse on only predicted statement costs, then you potentially have a serious performance tuning problem. Trust me when I say that I’ve seen many situations where good intentions have actually made SQL performance worse.
A combination of object analysis, RUNSTATS, catalog statistics analysis, access path analysis, and benchmarking is the ultimate SQL tuning solution.
Requirements for Benchmarking
In order to do effective benchmarking you are going to at the very least need some sort of performance monitor. DB2 for LUW has a fairly robust set of monitors. I personally enjoy using the good old fashioned command based snapshot and event monitors, but there are also a variety of monitor functions available that provide excellent performance information. DB2 for z/OS does not come with a built-in monitor and so if you don’t own a performance monitor the options for monitoring are limited. I’ll address this as I discuss various benchmarking techniques.
DB2 for z/OS and DB2 for LUW both have a dynamic statement cache (DB2 for LUW has both dynamic and static statements in the cache), and in this cache are held a variety of statement performance metrics. This in itself can be a valuable resource for performance benchmarking. There are many articles available on using the statement cache for performance monitoring, and a review of those articles will help a lot with benchmarking analysis.
There are a variety of over the counter and even shareware based benchmarking tools available on the internet. These tools can be used to perform some serious benchmarking, even replicating actual production workloads. The tools are many and varied and require significant research to determine which may be best for your environment. In this article I’m assuming no tools are available and so will offer simple yet productive benchmarking examples for single query and/or transaction tuning.
In the most simple of cases of single query tuning there are some easy things that can be done to verify that a change in an access path will actually result in better performance. While these simple tests are not conclusive they in the very least give a good idea of whether or not a change will be a success. Please keep in mind always that looking at your watch and hitting enter is not a benchmark!
SPUFI or DSNTEP2 on DB2 for z/OS
I’ll use SPUFI for very simple before/after tests that give me a basic understanding of the performance of one execution over another. In advance of any query execution that I measure for performance I may either make sure the data is flushed from the buffers or attempt to make sure the data is held in the buffers. Stopping the pagesets (table spaces and index spaces) that are involved in the query will increase the chance the data has to be read into memory when the query is executed, while running the query multiple times before the measured execution will increase the chance that the data is in memory when the query being measured is executed. I set autocommit to off so that I can keep the thread active. In that way I can collect performance metrics from an online monitor before the transaction goes away. If I’m using the dynamic statement cache to collect performance metrics then I’ll perform a RUNSTATS and specify REPORT NO UPDATE NONE to make sure the statement cache is flushed prior to each execution I want to measure, and then explain the statement cache and query the DSN_STATEMENT_CACHE_TABLE after each execution I need to measure.
If the query to measure is embedded in a DSNTEP2 execution then I’ll carefully record the start and end times of the submitted job, and then collect an accounting trace report using whatever performance monitor software is available.
When using either SPUFI or DSNTEP2 I may execute only a single statement or multiple executions of the same statement or different statements. I do, however, usually use this method for only the most rudimentary of tests. They can be quite useful for testing the performance of the materialization of large queries.
CLP on DB2 for LUW and DB2 for z/OS
The DB2 command line processor can be used in a manner similar to the SPUFI/DSNTEP2 technique. You can set up file input for a CLP invocation that contains a single query or multiple queries. Capturing the output in a file and documenting the start and end times will help synchronize the execution of the test with the collected performance metrics. Putting a SELECT CURRENT TIMESTAMP FROM SYSIBM.SYSDUMMY1 at the beginning and end of any SQL script will help record the times as well!
IBM Data Studio SQL Script Execution for DB2 for LUW and z/OS
When running queries via IBM Data Studio the SQL results window does include the query execution time in the status tab. While this is an easy way to see if one query returned a result faster than another query it is not necessarily the most reliable method and should be backed by metrics gathered from a performance monitor to be deemed reliable. When used in combination with a database performance monitor it could give you some idea as to the influence of network latency on overall query performance. You should take note, however, that IBM Data Studio has a limit to the number of rows returned. If possible, you should modify the limit so that all rows are returned in order to get the most accurate comparison between two query executions.
The db2batch Utility for DB2 for LUW and z/OS
This extremely flexible benchmarking tool can be used against a DB2 for LUW or DB2 for z/OS database. I’ve had problems getting it to work in the past, but am currently using the db2batch command in the IBM Data Server Client version 10.5.2 quite successfully against a DB2 for z/OS database. The results will include elapsed time output per statement and at a summary level, with the option of much more detailed performance metrics for a DB2 for LUW database (including monitor metrics). The input to db2batch can be one or more SQL statements grouped into one or more transactions (groupings) with an iteration set for each transaction. You can even set up a parameter file to include input parameters to all of the SQL statements in the benchmark, although this can be a bit tricky and take some trial and error to set up.
I typically set up batches of SQL statements to test for I/O or CPU performance much in the same way I run tests against DB2 for z/OS using REXX (see below). If I want to test for random I/O performance I’ll code some clever SQL to generate random input for my statements, such as this query against the DB2 sample table EMP (or EMPLOYEE).
WITH RANDOM_EMPNO(EMPNO) AS
SELECT E.EMPNO, E.LASTNAME
FROM EMP E
WHERE E.EMPNO = (SELECT SUBSTR(DIGITS(EMPNO),4,6)
REXX on z/OS
I have found REXX to be one of the most useful tools for benchmarking queries on DB2 for z/OS. I almost always use REXX routines in combination with an online DB2 performance monitor to capture SQL performance metrics. This involves writing 2 different types of REXX routines; one for testing I/O and one for testing CPU. In either situation the REXX connect to a data source, declares a cursor, and then within a loop (usually between 50 and 5000 iterations) will open the cursor, fetch from the cursor, and then close the cursor. Once the loop is finished I sometimes put an instruction in that starts an interactive trace. That way, if I am running the REXX program from a TSO prompt (or ISPF option 6) it will pause at that point, allowing me to collect the performance metrics from my online monitor. If an online monitor is not available I will record the start and end times of my tests and pull accounting trace reports from a batch reporting tool.
The only difference between my REXX for CPU and REXX for I/O programs is that the REXX for I/O program reads data in from an input file. Usually what I do is run a query against the table or tables used in the query I am testing that will extract keys I can use as input to the REXX program. I also add an additional column to the output generated from the RAND() function and sort the output on that column. In that way my input values will be random. The output generated from this query is placed in the file that is used as input to the REXX for I/O program.
Native SQL Stored Procedures for DB2 for LUW and z/OS
Native SQL procedures can be used much in the same way as the REXX routines on DB2 for z/OS. Setting up loops to test statements for CPU consumption can be in many cases much easier than with REXX, and invoking the benchmark tests can be as easy as calling the stored procedure from a variety of sources such as the command line processor or IBM Data Studio. One of the challenges is in simulating file input since file input is not possible in an SQL routine. However, you could establish a separate driving cursor to provide the input. That will, however, influence the performance metrics. Another technique would be to simulate random input using a technique similar to what I have used in db2batch tests.
Advice for Running Benchmarks
- Prepare the data for the test. This would involve making sure that the test tables have the correct data in them, are backed up if the test will modify data, and are either organized or disorganized based upon what you are testing for.
- Prepare any input files for the test. This may involve generation of the input files and/or sorting of the file to simulate a particular input pattern. Make sure that there is a base copy on the input data that can be used to recreate the input if you make modifications and eventually need to go back to the original.
- Make sure the appropriate traces are set and any monitoring tools are active.
- Flush the dynamic statement cache if you plan on using the metrics stored there as test results.
- Check to make sure the machine is not currently overloaded. A busy LPAR is going to heavily impact test results. Make sure your tests are run when there is some headroom so machine stress is not a significant variable. Always pay attention to in-DB2 times only, when reviewing test results.
- Flush the DB2 buffers. If you are testing for I/O you especially should flush the buffers before running a test. Also, make sure that your objects and buffers are sized in such a way that you will be somewhat reflecting reality when running the tests.
- Execute the benchmark test. Be sure to mark the exact start and stop time of the test so that you can correlate the results with the test. Set up a Microsoft Excel document to help you document the tests along with the key metrics.
- Collect the results. Put the results in a Microsoft Excel spreadsheet with full description and metrics. Include a paragraph that explains the reason for the test. When you present the reports in a design meeting, people will grill you!
When interpreting DB2 for z/OS benchmark results, make sure that you pay attention primarily to the class 2 and class 3 measurements. There can be great variations in REXX performance, as well as any batch or online testing you perform due to the workload manager settings for your test jobs or TSO address space. Repeat an identical test several times to make sure that a test that should result in consistent performance metrics actually gets those consistent performance metrics.
I consider benchmarking a crucial part of SQL tuning and ultimately to SQL and overall application performance. There are so many ways to do it without building an application, and in many situations these types of tests can be conceived, coded, executed, and measured in a matter of a few hours. Imagine all the time saved. Instead of sitting in meetings for hours debating how everyone “thinks” something will perform, you can instead run a simple test that actually demonstrates how it will perform. Be sure to keep very detailed and careful documentation, including any pretty bar charts or graphs, so that the people you present the results to will understand the purpose and the result. Hopefully, this will enable your team to make an educated guess based upon some real numbers when making those critical design decisions!