Sometimes we forget. It isn’t so much getting older (I hope) but more that we continue to get busier and busier. With the rush to pick up and handle the new tasks that are thrown our way we sometimes miss the basics. Application design that takes into account performance is one of those basics. If you want good performance in a DB2 application the best thing you can do is design it with performance in mind.
Here is a case where I was recently the lead on a project with some excellent developers. They have a lot of knowledge and experience, but haven’t really worked with DB2 for z/OS at all.
Our Claims database is in transition. We’ve spent several years migrating our data from VSAM files into a fully relational database. The complexity of the data, with many repeating groups of elements, led to a design with over 140 tables. This, of course, leads to performance challenges when handling inquiry activities. The application that uses the database always needs the entire Claim so we are accessing all those tables all the time.
Each Claim has 1 to 998 Service Lines with an average of 2-3 lines. About 50 of the tables contain Service Line data. This is an ideal configuration to take advantage of multi-row SQL statements and we’ve done that with our COBOL/CICS modules that maintain the data. In fact, for every table in the database we try to touch it just once per Claim. That is still a lot of SQL statements and when we are dealing with the current record our overhead is well within planned CPU and response times. Our problems began when we discovered the need to access hundreds and sometimes thousands of historical Claims, leading to much effort optimizing the historical access while maintaining application functionality. But still, we have outliers where we cannot meet our service level agreements. This work has been done under one project with experienced and capable DB2 and CICS developers and DBAs involved constantly.
The long term solution for dealing with the large (and growing) number of tables requires yet another transition. This time we’re moving toward the use of XML for storing the Claim in one place. I’ve written about that elsewhere and won’t go into much detail here. The important point is that we’re designing our newer applications to work well with the future database and adequately with the current database. And that design needs to cover several aspects related to performance.
The Inquiry Application
With the data firmly entrenched in DB2, it was time to start building our GUI inquiry application. We brought in some great programmers that built an amazing GUI. From our user’s perspective it is ideal. The GUI allows them to do in seconds what used to take them minutes searching through various CICS screens to find bits and pieces of information.
The development team and DBAs worked together to ensure database searches used the planned access paths and indexes. SQL was reviewed and recommended changes were made. Based on the transition of the database from relational tables to XML-based content, our design pulls in the entire Claim and sends it to the browser in JSON format. We have a little more overhead while we continue to read from our 140 tables, but that was expected during our transition. The data access layer of the application is designed to handle either format (fully relational or XML) for any individual Claim. Functional system and user testing found performance to be within expectations. We even rolled it out to 250 pilot users without any complaints of performance. In-DB2 CPU times (total seconds per day in the chart below) looked pretty good too (with the exception of a single runaway query on 5/8 due to a coding defect).
It looked as though our database and application design efforts had paid off.
What did we miss?
You know we had to have missed something, otherwise there’d be no point to this article. This is where forgetfulness had its impact. I was so used to dealing with the experienced DB2 team that I didn’t watch the GUI team’s use of SQL very closely. Here’s an example of a typical statement:
SELECT DED_AMT, COPAY_AMT, COVD_AMT
WHERE CLM_ID = ?
AND RCVD_DT = ?
AND SVC_LN_NBR = ?
A quick review of this statement doesn’t draw much concern. It’s a simple retrieval of the data for one of the Service Line child tables. Block fetch from the client will yield the same effective performance as a multi-row operation. It specifies the RCVD_DT column which we use for physical partitioning so we’ll get partition pruning. The explain results will look good as well. Not worth spending too much time on, right? Have you spotted the problem yet?
Remember, we can have 1 to 998 Service Lines for each Claim and we have 50 tables per Service Line. No one noticed the problem because most Claims have 1-3 Service Lines. With 3 Service Lines, we’ll execute 150 (3 x 50) SQL statements retrieving the Service Line data. When we have a larger Claim, then we’d start to see significant elongation in response times. That’s exactly what our (very thorough) load and performance team found.
When Claims with 130 Service Lines were tested in our load & performance environment, the response times jumped up to 10 seconds and then 20 seconds for Claims with 220 Service Lines. That’s 11,000 (220 x 50) SQL statements from the web server to DB2 for a single Claim.
I’d forgotten to make sure the Java developers working on this part of the application accessed each table only once per Claim. So, where I was expecting 50 SQL statements we were executing 11,000 statements. Why hadn't the developers done the obvious optimization? Because it wasn’t obvious to them. Their focus was on delivering the thousands of functional requirements needed for the GUI to be successful.
One of the saving graces in the application design was the use of the web browser to do most of the work. In a traditional GUI, the web server has to re-read the data from the database in order to build, format and send each page to the browser. This requires multiple database accesses as the user moves from one page of information to another within UI. In our design, we’re building the pages in the browser from the data that is already there.
From an SQL viewpoint, the fix is fairly simple. We just needed the SQL statements modified to look like:
SELECT DED_AMT, COPAY_AMT, COVD_AMT
WHERE CLM_ID = ?
AND RCVD_DT = ?
From an application coding viewpoint, this takes a bit of refactoring. But in the end it is worth the effort. We are now retrieving all rows for every Service Line with each statement instead of just the rows for a single Service Line. That reduced our Service Line SQL statements from 11,000 to 50 for our 220 line Claims. We cut those longer load & performance response times in half. Plus, it cut many of our smaller, more common, cases down to sub second response times.
Why didn’t we hear complaints from our pilot users? From an end user perspective, even the poor response time was a win compared to the old system. In the old system, to look at the 130 Service Lines in a Claim they would have had to hit the page forward key so many times that waiting 10 seconds was significantly faster and easier for them.
Browsers are very capable and the frameworks available for developing complex user interfaces can help build more responsive applications. Encourage your developers to take that next step. It is worth it.
And, make sure you don’t forget the basics. Optimal application performance must be designed into the application at both high levels and low levels. Keeping database accesses to the minimum necessary is a key ingredient to success. Good load & performance testing helps catch those lapses where we forgot something.