DB2 10.5 is best known for BLU Acceleration, which is the new columnar store for tables. Organizing tables by column rather than row can lead to dramatic improvements in elapsed time for analytical queries.
You might think that if your shop is primarily OLTP then you don’t need to upgrade to 10.5 and could just stick with version 10.1 (and many shops are still at 9.7 or 9.5). If that is your thought, you should know that there are many other improvements in version 10.5 covering performance, availability and ease of use.
Any list of favorite performance improvements is naturally very subjective and dependent both on the writer’s needs and past experiences. With that in mind, here are my top 10 performance improvements – mostly in 10.5 but also a few in 10.1 for those of you who haven’t moved up yet. Note: BLU is only number 2 on my list.
1. Index on Expression (also known as functional indexes). These have been in DB2 z/OS since version 9 and have finally made it to LUW in version 10.5 (XML functional indexes were added in 10.1). The most common use case is UPPER(character column). This allows storing the data in mixed case and querying in any case without worrying about matching the case. With a normal index on the column you will not get index access. With an index on expression you will. This is also certainly better than storing an extra column in the table in upper case, creating another index on that column, and making sure that all programs use that new column for the predicate.
2. BLU. Columnar storage enables DB2 to read from disk just the columns that are needed – thus reducing the amount of I/O. It also enables a number of other techniques which working together really make things fly. These include:
- Actionable compression. DB2 can apply predicates on the compressed data which gives a tremendous improvement in CPU time. Since consecutive data (the values) of a column are more similar than in row oriented format, the compression is also improved, with up to 10 X compression ratios.
- Parallelism. BLU can do multi-core parallelism for the processing of a column as well as vector parallelism by processing up to 4 values at a time in a single register.
- Data skipping. DB2 builds a synopsis table which contains the low and high value for a column in each page. This allows DB2 to skip (i.e. not even read) a page that will not satisfy the query predicates.
Speedups of 10 – 100 times have been reported with BLU. Best of all, this is all within DB2, not a separate system for analytical queries (like Netezza or noSQL solutions like Hadoop. You can even join column organized tables and row organized tables within the same query.
3. Temporal Tables (time travel query). These were added in 10.1, but further improved in 10.5. In comparison to manually maintained history, the use of temporal tables and temporal queries can perform much faster, improve developer productivity (simpler to use) and can better ensure the consistency and accuracy of the data.
4. HASH Join keys can have expressions and different data types (V 10.1). For batch processing, HASH joins are often the access path of choice, especially if the tables are not both clustered on the join keys. Prior to V 10.1 HASH JOIN could only be used if the join predicates were of the form COLA = COLB with both columns having the same datatype. With V 10.1 expressions can be used in the join predicate. This will enable DB2 to use HASH joins in many more cases.
5. Index Jump Scan (v10.1) & Read Ahead Prefetching. These are two improvements to index access which can improve performance, reduce the need for index reorgs and possible reduce the number of indexes needed.
- a. Read Ahead Prefetching. When a new entry is inserted into an index page which is full, DB2 has to split that page and move some of the entries to a new page, which is likely to not be nearby. This slows down access. An index reorg will make the index entries and pages consecutive. This new feature improves DB2’s index prefetching so that pages that are not in order do not have as much impact on the scan performance. It is therefore not necessary to do index reorgs as frequently.
- b. Index Jump Scan. If an index has several columns and a query has predicates on (for example), the first and third column, DB2 can only match the first column (i.e. use the index structure to find the first occurrence of a match on that column). It will then have to sequentially scan the index entries for all entries matching that column. If the second column has low cardinality and the third column has high cardinality, DB2 may now choose to use a jump scan access path. It will make multiple probes into the index, one for each value of the second column and using the third column predicate value – effectively skipping over all of the entries that don’t match the third column.Not only will this be more efficient, but if you had created a new index on just columns one and three for the purposes of better handling these queries, you may now be able to drop that index and still get good index performance.
6. pureScale with HADR. pureScale provides both scalability and availability with multiple DB2 servers accessing a common database. HADR is DB2’s premier high availability solution with hot backup servers. You can now use HADR with a pureScale cluster for even greater availability and failover.
7. pureScale Rolling Update and Online Add Member. Of course, with a scalability solution, you want to be able to scale (add more servers) without down time. With V 10.5 you can add new members to the cluster while the system is online. You can also apply maintenance (new fixpacks) one server at a time.
8. Indexes without null keys. Very often an indexed column may have many rows with NULL values for that column. This can prevent you from defining that index as UNIQUE (if the non-null keys are unique). If you don’t need to find rows with a NULL value, then defining the index to EXCLUDE NULL KEYS will reduce the size of the index and allow the index to be defined as UNIQUE. Reducing the size will not only speed up full index scans, but may also reduce the number of index levels, which will improve the access time for all index access. Bufferpool utilization will also be improved.
9. INPLACE Reorg – Cleanup Overflows option. Overflow rows are created when an update increases the length of the row and there is no room on the page for the increased size. The row is moved to another page, with the original page getting an anchor pointing to the new location (indexes still point to the original page). A reorg will clean up overflows. If your only reason for running the reorg (e.g. the table is still well clustered), then you can use the new CLEANUP OVERFLOWS option. This reorg will run much faster than a normal reorg.
10. Adaptive Compression (V 10.1) and INPLACE Reorg with Adaptive Compression (V 10.5). Adaptive Compression is page level compression in addition to the usual table level compression. This leads to better compression ratios, which of course leads to improved performance with fewer I/Os. Unfortunately, in V 10.1, you cannot do an inplace reorg of a table using adaptive compression. Since most shops rely on inplace reorgs for maximum availability, they have not been able to take advantage of adaptive compression. This restriction has been lifted in V 10.5.
Those are my favorites. What are yours?
About the author: Joe Geller is an IBM Champion for Information Management and has been a DB2 consultant since Version 1.2. He’s been a Data Architect, DBA, Data Modeler and Software Engineer, often simultaneously. He feels that the best way to learning one aspect of a system is to work on all aspects. Joe specializes in performance tuning. He once rewrote a PeopleSoft view and improved the query by a factor of 70,000. Joe is the author of two books on database systems – DB2 Performance and Development Guide and IMS Administration, Programming and Data Base Design. Currently Joe is working as a consultant for The Fillmore Group.