DB2 11 Application Developer for z/OS, part 6 - Performance Considerations
I think you'll agree that a good developer should have a reasonable knowledge about the technology he or she is using in the application. Of course it is not possible to know deeply every single detail these days, but is usually useful to understand at least the main principles and to know where to look or where to ask for more details.
The same applies to DB2 developers. They don't need to know every single feature in DB2 nor to understand all bells and whistles. However, they should not only understand why SELECT * FROM TABLE;, then ordering the rows in the application, and retrieving just a subset of the rows is (usually) not a good idea, but moreover, should understand how DB2 processes a query and know different access paths DB2 is using for retrieving the rows. That is the topic of this part of the DB2 11 Application Developer for z/OS certification - Performance Considerations. You should learn here the basic principles and you should be able to find the details when needed. I would point out two sources - Managing Performance guide (where to look), and IDUG DB2-L forum (where to ask).
Below are the links I found useful for the four sections of this part of the test.
Identify examples of accounting trace usage
- First, you need to know how to find possible problems with the SQL, and you should be able to see the differences after any changes. One of the means, which can help you is monitoring DB2 using the DB2 traces. DB2 traces capture information on several events identified by many Instrumentation Facility Component Identifiers (IFCIDs), which are grouped into classes for each trace type. To process and analyze the traces you can use several tools that build on top of this DB2 feature and that can provide details about the data. However, that is not covered in the test.
- The trace we will be using in this section is the accounting The accounting trace records transaction level data and it provides information about the elapsed and CPU times, the number of commits or aborts, how many times a certain SQL statement have been issued, and many other data about resources consumed. Therefore it is helpful especially for application tuning and capacity planning.
- The accounting trace can be started automatically, depending on your SMFACCT subsystem parameter, or it can be controlled using the trace commands. START TRACE command starts the DB2 traces. Refer to the START TRACE documentation for a list of different DB2 traces including the Accounting trace's classes and corresponding IFCIDs.
- There is also a great Redbook - Subsystem and Transaction Monitoring and Tuning with DB2 11 for z/OS. Read Chapter 2 for more details about the traces in general and refer to the Part 3 of the book for transaction monitoring, especially the Chapter 14 with accounting trace overview.
- In the book, you will learn the important terminology for DB2 accounting times - the Accounting Class 1 records the complete elapsed and CPU time (both inside DB2 and in the application), the Accounting Class 2 records the elapsed times spent in DB2 only, and the Accounting Class 3 collects the suspension times.
- You can learn more about DB2 traces in general in the Managing Performance guide, or for an overview of the traces I would recommend Denis Tronin's presentation.
Understand the use of explain data and IBM Data Studio
- I would say that the prerequisite to advanced application tuning is to understand the EXPLAIN statement and the EXPLAIN tables. EXPLAIN statement obtains information about access path selection for an explainable statement (SELECT, MERGE, INSERT, or the searched form of an UPDATE or DELETE) and saves the information into several user tables called EXPLAIN tables:
- PLAN_TABLE is the starting point for access path analysis. It contains the information about the access path for a specified statement (see below for differences in DB2 access paths),
- DSN_STATEMENT_TABLE table (also known as the statement table) contains information about the estimated costs (in ms and su),
- There are several other tables that are appended or updated with new versions of DB2.
- DB2 optimizer is the component that selects an access path for a given SQL statement. The optimizer used in DB2 is cost-based, which means that it tries to select an access path with a lowest estimated cost based on the statistics that are available for tables and indexes. As written above, the Statement table can reveal the estimated cost, but keep in mind this is just an estimate. The important column in the Statement table is the COST_CATEGORY. If it is A, DB2 had enough information when calculating the estimate. If it is B, DB2 had to use default data. In such case the REASON column can provide more details, for example the common problem is the missing statistics.
- To analyze the access path Using EXPLAIN, you perform the following three basic steps:
- The last point in the previous item - interpreting the data - can be extremely difficult if you go beyond basic SQL. However, there are tools that can help you with this task. One of them is IBM Data Studio that can, among the other tasks, help with interpreting the EXPLAIN data. You can download it for free here.
- org provides several blogs and tech talks about the Data Studio, I would recommend at least two - An Introduction to Data Studio and Tuning Queries with IBM Data Studio both by David Simpson.
- Data studio can generate a diagram of the current access plan for an SQL statement using a Visual Explain component of the Data Studio.
- The final diagram consist of several query blocks that represent parent-child relationship by a tree hierarchy.
Identify differences in DB2 access paths
- Depending on the SQL, statistics, system parameters, environment, and other factors, DB2 may select various access paths to access the data. The glossary defines an access path as "The method that is selected by the database manager for retrieving data from a specific table. For example, an access path can involve the use of an index, a sequential scan, or a combination of the two."
- To pass this section of the test you should understand the differences between the available access paths. Good starting point is a set of questions that can help you to identify the access path that was selected for your SQL.
- The details about the access paths can be found in the section about interpreting the EXPLAIN data. I would point out few items:
- Make sure you understand the differences between the table scan access and index access paths.
- Index access is a whole family of paths, and it is good to know what's the difference between the matching index scan, index screening, and non-matching index scan; that DB2 can also use multiple indexes and what are the cases when one fetch index access or the index only can be used.
- When the SQL involves more than one table and combines them, DB2 uses multi-table access paths, which cover:
- Access paths that involve sorting may be very costly, so it is important to understand when DB2 will use a sort access path. Also, remember, that proper indexes can be used for sort avoidance.
- It is worth knowing that DB2 also gives you some options how to manage query access paths.
Identify Stage 1 versus Stage 2 Predicates
- I am pretty sure you've already heard that rows retrieved by DB2 go through two stages of processing - Stage 1 and Stage 2. Predicates that can be applied during the first stage are called stage 1 or sargable (searchable arguments) predicates, the other are called stage 2 or non-sargable predicates. As the link above states: You can improve the performance of your queries by using predicates that can be applied during the first stage whenever possible.
- Whether a predicate is stage 1 or stage 2 depends on several factors, including the syntax and data type. This summary of predicate processing lists the predicate types as well as the processing order. Please note that the list of the predicate types varies with new versions of DB2 so make sure you always look at the correct version.
- The important thing to remember is that all indexable predicates are stage 1, but not all stage 1 predicates are indexable (for an example see here - C1 LIKE %BC is stage 1, but not indexable).