When DB2 Version 1 was released in the mid 1980’s, the promise (and myth) of Relational databases was that you can code without knowing the structure of the database – no navigation was needed. The reality then and still today is that while you can code with no thought to what indexes exist or how you code the SQL, your program will work, but it may run very slowly. It is still very common to find code written with no thought to either indexes or best practices.
With each new release, DB2 is getting smarter and smarter. There is more query rewrite (including transitive closure and converting subqueries to joins) and there are more predicates which can be indexable. This improves overall performance even when the developers are not experts. However, DB2 is not yet clairvoyant enough to understand the intentions of all queries and turn them into optimal forms.
This is not a criticism of the Optimizer. When host variables are used (especially with range predicates), DB2 has no way of knowing what value will be used at execution time and whether this will result in an average filter factor, very few rows selected, or most of the table selected. Even if distribution stats and histograms are collected by Runstats, that information cannot be used in this case. In addition, many shops do not collect distribution stats on all columns and all column groups that are used in queries.
That being said, each release of DB2 makes great improvements in query performance. These enhancements fall into a number of categories.
- Engine (including buffer manager, data manager, RDS)
- Faster predicate evaluation
- Faster sorts
- New and improved prefetch. Originally there was just sequential prefetch. Now there is also list prefetch, dynamic prefetch, sequential detection
- New access paths – Originally there were 2 join methods (nested loop and merge scan). To that was added hybrid join (z/OS), hash join (LUW) and star joins. Multi-index access and range list index scan have been added.
- SQL Language – new functions or language constructs that improve productivity (the power of the language) and which therefore simplify the query (fewer steps/queries needed) which leads to better performance.
- Multi-row fetch
- Olap functions
- Query Transformation (also known as query rewrite).
- transitive closure
- subquery -> join transformation
- predicate pushdown
- V11 – functions rewritten as expressions
- Nested table and view merge
- Access Path Selection - better determination of the best access path; more statistics are available (such as histograms) in recent versions of DB2
We usually think of the Optimizer as a combination of Query Transformation and Access Path Selection. The DB2 Optimizer does a wonderful job of selecting the best access path for a query and each release of DB2 does more and more to simplify the coding process and improve performance. It does this by making more predicates indexable, and by query rewrites that enable DB2 to have more choices of access path. For example, without query rewrite a subquery was always processed in a particular order. A correlated subquery would start with the outer table and for each selected row, execute the inner query. A non-correlated subquery would start with the inner query and then process the outer table. But, a join can be done in either order, with the Optimizer determining which order is cheaper. With query rewrite, DB2 can change one form into another.
The goal of these improvements is to enable good query performance even when the developer is not an expert SQL coder. We are getting closer to that goal, but even with DB2 11 for z/OS, there are still many queries that will not perform well as written, but can be rewritten manually to achieve good performance. Earlier this month, IDUG published an article by Tony Andrews (Advanced SQL and the Power of Rewriting Queries) showing how different ways of coding a query can result in different access paths and performance. For those of you lucky enough to have attended the IDUG NA 2014 DB2 Tech Conference, you have access to a presentation by Dan Luksetich on Understanding Query Transformation in DB2 for z/OS.
Even as the Optimizer gets smarter, we are also faced with the fact that most of us are not yet at the latest release of DB2, sometimes even more than 1 release behind. Also, while both DB2 for z/OS and DB2 for LUW are making these improvements, they are not doing so simultaneously. Some enhancements are first introduced on z/OS and some on LUW. You therefore cannot rely on DB2 handling your queries identically in the two environments.
If you learn good coding practices, then you can produce high performance applications from the beginning, thus reducing the amount of tuning and monitoring that will be needed.
Let’s look at some examples.
1) Predicate Transitive Closure. This is based on the logic truth: IF COLA = Literal1 AND COLA = COLB THEN COLB = Literal1. We usually do not include the predicate COLB = Literal1 because it is redundant. However, if there is an index on COLB and that would be the best access path, then having that redundant predicate gives DB2 more information and it can pick that access path. We used to have to code that extra predicate ourselves, but now, DB2’s Query Transformation can add it for us – in most cases. There are however still some cases where it does not. LIKE predicates are one example. I have also found that in some complex queries with nested table expressions and subqueries, this transformation is not always done.
2) Using functions or expressions on a column usually make that predicate non-indexable (until V11 of DB2 for z/OS).
- YEAR(COLA) = 2005. V11 can rewrite this as COLA BETWEEN DATE(‘2005-01-01’) AND DATE(‘2005-12-31’). But if you are not on V11, you have to do this yourself if you want DB2 to use an index on COLA.
- COLB + 5 = :HOST VARIABLE. Recoding as COLB = :HOST VARIABLE – 5 will enable index access on COLB. Many expressions of this type can be rewritten by DB2, but not all. In fact, both this type of predicate and predicate transitive closure were first implemented for equal (=) predicates, but not range (e.g. >) predicates.
- UPPER(LASTNAME) = UPPER(:HOST VARIABLE). This one cannot be rewritten to an indexable form. However, DB2 now has functional indexes (Index on expression). If the DBA creates an index using this expression (or the ones in examples a and b), then DB2 can use that index if the predicate matches the index definition. Index on expression was introduced in V9 of DB2 for z/OS and V10.5 of DB2 for LUW.
3) Nested table and view merge versus materialization. Originally, a join of a nested table expression to another table would result in the nest table being evaluated and materialized in a work file first and then joined to the rest of the query. DB2 can now often merge the nested table as it does with views and avoid materialization. This is usually a good thing. But sometimes it is better to materialize and force the order of the processing. We can do this by adding a SORT to the nested table expression or by using the RAND() function within it.
4) Implicit Casting. Until recently, DB2 enforced strong typing. You could not code CHARCOL = 3; or INTCOL = ‘789’. These resulted in errors. For a long time, even different length operands of the same datatype prevented index access. Luckily we no longer have to worry about joining tables where one join column was CHAR(24) and another was CHAR(15) (as long as functionally the data does match). Again, this length mismatch was first handled for = predicates and later for range predicates.
Now, DB2 allows implicit casting. You can code CHARCOL = 3. This was done to match industry trends where weak typing in scripting languages is common and other DBMS’s allow it. But, you need to still understand what is happening. DB2 will transform charcol = :hostint into cast(charcol as integer) = :hostint (actually DB2 uses DECFLOAT(charcol)). In implicit casting, there are rules which include CHAR gets converted to numeric. If the index is on charcol then it is not indexable. To make it indexable, you should do the casting yourself in the other direction: charcol = cast(:hostint as varchar(9)). Of course, this assumes that your character column is formatted nicely such as no leading spaces.
What’s the bottom line? Do you need to worry about how you code? The answer is yes, but not as much as in the past. There are 4 steps you should follow as a developer to achieve good performance from the beginning without a lot of tuning effort:
0) Learn as much as you can about DB2 performance, access paths, filter factors, good SQL constructs for different situations, and how to do Explains and understand the Explain output. Be aware of what indexes exist for your tables and the data types of the columns.
1) With this knowledge, code your queries and programs, concerning yourself primarily with correct coding that meets the functional requirements. If you have prepared yourself, it is likely that most of your code will also perform well.
2) Run Explain on all of your programs and queries and check the access path. Most problems can be spotted fairly quickly by looking at the estimated costs (overall, and each node in the Visual Explain). Always do a performance test (i.e. against full volume test tables).
3) If the access path or execution time (elapsed or CPU time) is not good, then it is time for performance tuning. Among other techniques, consider rewriting the query. Perhaps a change from a subquery to a join or other transformation will do the trick (as outlined in Tony’s paper). Redundant predicates are needed less often than in the past, but it never (or rarely) hurts to give the Optimizer more information. Of course, some problems are solved with index changes or the gathering of more statistics, but a change to the code is often the simplest, quickest solution without the overhead of adding a new index.
About the author : Joe Geller is an IBM Champion for Information Management and has been a DB2 consultant since Version 1.2. He’s been a Data Architect, DBA, Data Modeler and Software Engineer, often simultaneously. He feels that the best way to learning one aspect of a system is to work on all aspects. Joe specializes in performance tuning. Joe is the author of two books on database systems – DB2 Performance and Development Guide and IMS Administration, Programming and Data Base Design. Currently Joe is working as a consultant for The Fillmore Group.