The Basics of Achieving High Performance with DB2
This month, IDUG’s topic of discussion will be “Performance.” This is an extremely wide topic that has the attention of many of us in our daily activities. We’re constantly striving for high performance, our managers are constantly demanding high performance, and IBM is quite dedicated to delivering a higher level of performance with each release of DB2. You can expect that this month’s content delivered to the IDUG website will be rich, with a variety of performance topics covering a diverse range of features; but, this first article will focus on some of the basics of performance.
System Tuning Versus Application Tuning
There has always been a division in the eyes of some people with regards to performance tuning. Many DBAs spend a significant amount of time monitoring and tuning DB2 configuration parameters in an attempt to achieve a higher level of performance for the applications using DB2. Certain parameters, such as buffer sizes, can make a huge difference in performance if they are terribly undersized to begin with, but once most parameters are set within a reasonable threshold, any changes typically make little to no impact on performance. I believe in the 80/20 rule when it comes to performance tuning. That is, 20% of the focus should be on systems and 80% should be on database and applications. When it comes to application performance tuning, you should be looking at either SQL or database design. Hopefully, both together!
Application Design for Performance and SQL Tuning
There are two major challenges when it comes to SQL tuning. The first and most important is the quantity of SQL statements issued, and the second is the relative performance of each SQL statement.
Ever hear the saying that the most efficient SQL statement is the one that is never executed? Well, it’s true. The biggest challenge in today’s fast-paced application development environment is keeping the number of SQL statements executed per transaction to a minimum. In many cases, the development utilizes a framework and service-oriented architecture. Many times database tables are mapped one to one with objects in the data services application layer. Repeated calls to a specific service can likewise result in repeated calls to populate an object from a database table. A greedy design where service calls are made repeatedly within a unit of work can result in an extreme degradation in performance. Many times the database is to blame, but database monitors show extreme quantities of reasonably performing SQL statements. There is very little system or database tuning that can be done in these situations. This is why when designing a new database application, the focus of the design needs to be a more conscious approach to design for performance. This performance-conscious approach should take two forms: the first focuses on calling the database only when necessary, and the second involves combining multiple calls into a single call. The first is conceptually easy: if a data object is already populated and there is no need to refresh it, the database call should be avoided. This may require some additional tests within the application code, but the results can be quite dramatic. The second may be a bit more dramatic, and it is best to do this only when performance is more of a concern versus speed of delivery and the flexibility of the code. I’ve had great success in designing applications where the focus on multi-table SQL statements was concentrated on the high-performance component of the application, while normal development was in place for the majority of the application logic. For these high performance components, you simply need to look at which tables are being accessed during a unit of work, which columns from the tables are required (if at all), and the relationships between the tables. If application SQL complexity is too much for the application developers, or if there is no interest in placing the complexity within the application code, then stored procedures, triggers, and user-defined functions provide a nice alternative for bundling multiple SQL calls into one call. It’s almost impossible to tune a simple statement against a single table with one predicate against a primary key column. But, with multiple table access, more tuning opportunities exist, and in general, the more complex SQL statement will almost always outperform many simple statements providing the same functionality. This is especially true across a network. For example, I myself have tested programmatic joins versus the equivalent SQL joins for two tables and realized a 30% performance gain for the SQL join.
More complex SQL statements offer more opportunities for the tuning of these individual statements, but before you set off looking into complex SQL tuning, you need to step back and understand the overall performance of your application. This is where active monitoring of your production DB2 system is important. Hopefully you have a vendor product that can analyze the appropriate DB2 trace output to produce meaningful performance reports. There are also DB2 components, such as the dynamic statement cache, that can be utilized to produce performance metrics. Whatever the case may be, you should be looking at the performance of applications at the application level, user level, and statement level. It is important that if you are going to be tuning SQL, you should tune the one that consumes the most resources or the one that exceeds a specified SLA, first. Regular monitoring should be a part of application and SQL performance tuning, especially when changes are being implemented. The ability to show a real-world result is crucial to getting management support for tuning.
When tuning SQL statements, I typically utilize two techniques to determine the potential savings to be realized from a database or SQL change: the EXPLAIN facility and benchmarking. EXPLAIN will take as input an SQL statement and externalize the predicted access path within the database engine. There are several EXPLAIN tables that are populated when a statement is EXPLAINed, so it is best to utilize a tool to analyze the access path. IBM provides a free tool called IBM Data Studio to visualize the EXPLAIN table information, and there are several tools supplied by various vendors for a fee. You can, if you are adventurous, write your own query against the EXPLAIN tables, and there are various examples available on the Internet. In addition to providing access path information, EXPLAIN also provides statement cost information. This is internal cost information that DB2 uses to determine the access path, and it is also information you can use to compare the predicted cost of two different SQL statements that are logically equivalent. While comparing the statement cost in EXPLAIN is a good start, it is not always the definitive guide to better SQL performance. So, I generally use cost calculations as one part of the predicted savings of an SQL tuning change. I also look at the access path and perform benchmark testing. The benchmark testing is usually performed using REXX or SPUFI on the mainframe or the db2batch benchmarking tool for DB2 for LUW. If a test database is properly configured and database statistics updated to match production statistics, you can get a pretty good estimate of the potential performance savings by running a benchmark test and capturing the performance using a monitoring tool or the statement cache. I always back any proposed SQL performance change with both EXPLAIN and benchmark test results.
Database Design for Performance
There are a myriad of tuning possibilities when it comes to the database. Before jumping onto any of the touted performance features of DB2, it is critical to first understand the problem that needs to be solved. If you want to partition a table, are you actually hoping that the partitioning results in a performance improvement or is it something more for data management or flexibility? Clustering of data is critical, but clustering has to reflect potential sequential access for single table, and especially multi-table access. Generic table keys are okay, but clustering is meaningless unless parent keys are used for clustering of child table data in situations where processing follows a key-based sequential pattern, or when multiple tables are being joined. Natural keys may be preferred, unless compound keys become exceedingly large. There are a number of choices of page sizes, and this can be especially important for indexes where page splitting may be a concern. Tables can also be designed specifically for high volume inserts with such features as “append only” and “member cluster.”
There is no reason to sit in meeting rooms for hours or days trying to come up with a database design that you “think” will perform properly. I personally let the database itself tell me how to design it. This involves setting up test database designs and test conditions that will mock the predicted application and database design. This usually involves generated data and statements that are run through SPUFI, REXX programs, db2batch, or shell scripts, all while performance information is gathered from monitors and compared. Sound complicated? It really isn’t, and most proof-of-concept tests I’ve been involved with have not lasted more than a week or two. The effort is minimal, but the rewards can be dramatic.
This has been a high level review of DB2 performance tuning, but I hope it serves as an inspiration to look for specific solutions to your performance issues. Don’t be afraid to test things and perform some benchmarks. There are plenty of examples available in articles and on the code page via the IDUG website, and various other internet sites. Keep your eyes on the IDUG website over the month of July as we’ll be continuously posting articles, presentations, and recordings on specific performance topics. Also, look for our first comments on DB2 for z/OS version 11 performance in late July or early August.