Mark Gillis and Iqbal Goralwalla, Triton Consulting
According to a recent Gartner report, data growth is the biggest data center hardware infrastructure challenge for large enterprises. As such, reducing storage-related costs remains a top priority for these organizations.
Over the past releases, DB2 on LUW has introduced various features that help towards reducing storage costs. This is depicted in the graphic below:
Since DB2 8.1 there has been a feature called value compression. This feature is not well-known, but it provides a cost-efficient way to reduce the size of your database with almost no impact on performance. No disk storage is consumed for NULL column values, zero length data in variable length columns or system default values.
Database Backup Compression, resulting in smaller backup images was also introduced in DB2 8.1, meant storage space that is consumed by administrative data could be controlled by compressing your backup images.
Multidimensional Clustering, again introduced in DB2 8.1 allowed for significant index compression to be achieved through block indexes. One key per block (thousands of records) vs one key per record with traditional indexes.
Multi-Temperature Data Management
Multi-temperature data storage options were introduced in DB2 10.1. This allows you to prioritize your data storage options. Rather than trying to provide optimal data access to the full range of data, by storing all of a table’s data in quickly accessed (and expensive) formats such as SSD, the data can be grouped according to how often it is accessed. Current (hot) data that usually needs to be accessed very fast can be stored on expensive storage formats such as SSDs, whilst the rest of the less frequently accessed data can be placed on cheaper and slower mechanisms like SATA or tape devices. This reduces storage costs and TCO.
Row compression has been available since DB2 9.1 and has been enhanced in subsequent releases.
A dictionary-based compression algorithm is used where data rows are compressed by replacing patterns of values that repeat across rows with shorter symbol strings. Of the various data compression techniques available in DB2, row compression offers the most significant possibilities for storage savings. Rows are also compressed in the transaction log records and also in the backup images. As such savings are not only realized on fewer disks for the data, but also extended to transaction log and backup disk space. Future expenditures for additional storage can be delayed and the data center can benefit from less energy and power consumption.
By storing compressed data on disk, fewer I/O operations need to be performed to retrieve or store the same amount of data. Therefore, for disk I/O-bound workloads, the query processing time can be noticeably improved. The diagram below shows the storage cost savings as a result of row compression.
Adaptive Compression introduced in DB2 10.1 is an enhancement to the Classic Row Compression. Rows are compressed by using a combination of a global static table level dictionary and local page level dictionaries. The big advantage here is that page level dictionaries adapt to data skew over a period of time. As such, DB2 is able to achieve higher compression ratios than classic row compression, and maintain these over time without table reorganization.
DB2 10.5 with BLU Acceleration takes compression to another level. Multiple compression techniques are combined to create a near optimal compression strategy for column-organized tables. This results in a significant reduction in disk space even compared to Adaptive Compression. The resulting lower storage costs makes it easier to manage data marts on a single server instead of the overhead of managing a cluster or logical partitions. Furthermore, columnar tables in DB2 10.5 do not need secondary indexes or performance enhancing objects such as MQTs, resulting in further savings in storage costs.
Results vary between applications and database designs, but customers typically report in the region of 10x better compression over previous releases.
Our own experiments show that a table with around 10 columns, mostly using VARCHAR, can be compressed to give around 70% savings in disk space if Adaptive Compression is invoked. However, if the table is converted to columnar organization, savings of more than 90% could be realized in disk space as shown in the table below:
In a scenario where a database is approaching the limits of its file-system, an upgrade to DB2 10.5 can solve the current performance problems of a database as well as avoiding the expense of acquiring extra storage.
DB2 10.5 allows data to remain compressed in memory as well as on disk (as Row Compression and Adaptive Compression do) but also allows predicates to be applied without de-compressing the data. This, combined with the better compression ratios available in 10.5, mean more efficient memory usage. Clearly this has a performance benefit but will also obviate the need for more memory to be made available. Typically DB2 will “rob Peter to pay Paul”: reducing the memory heap for one function in order to make more available for another. And data stored in bufferpools will be flushed out more regularly if the memory limits are being hit. One solution to this is to purchase more memory. Another is to upgrade to DB2 10.5.
DB2 10.5, FixPack 4 (Cancun) introduces Shadow Tables; automatically maintaining a set of column-organized tables (which are compressed by default) for OLAP processing alongside the OLTP tables. This avoids the storage, hardware and maintenance costs involved in purchasing separate data warehouse servers to host the OLAP database.
And because the ‘super-compressed’ data stays in memory even when analytical workloads are querying it because of the ability to apply predicates without de-compression, the overhead of analytical data flooding the database server memory can be avoided and there will be significant CPU savings. No extra data storage purchases, no extra memory upgrades and reduced CPU.
Bringing it all together
Imagine a scenario like this: a single instance on a single server has a transactional (OLTP) database and an analytical (OLAP) database that is populated directly from the transactional source. The analytical data is summed and grouped, so will occupy a smaller area of storage than the OLTP data for a given timeframe. But the analytical data needs to have historical data going back much further than the OLTP database, so its storage requirements might even be more than that of the OLTP database.
And both the OLTP and the OLAP requirements dictate fast access to data; immediate Insert, update and Delete transactions being needed on the OLTP side and instantaneous analytical reporting being needed on the OLAP side.
Remember that the OLAP data is conventional row-based storage and so needs a set of indexes of its own to enable performance.
From the above set up, we can observe the following:
- All data is being stored on fast (expensive) storage, e.g. SSD in order to provide the required performance needed in both the OLTP and OLAP systems
- Although he current OLAP data is aggregated, the footprint of the data is larger than the OLTP data due to the requirement to store historical data. This, too, is on expensive storage mediums, as the data is not segregated
- Both OLTP and OLAP are fighting for memory usage
- Projected data growth is going to exacerbate all of the above
Now consider the cost benefits that the same scenario might gain using the DB2 Multi-Temperature storage features
We can now see that:
- Multi-Temperature storage enables cheaper storage options to be aligned to less frequently accessed data (SSD, SAS, SATA, Tape) but this needs to be based on range partitions.
- Adaptive compression reduces the footprint of the data storage, thereby extending the life of your existing storage options and obviating the need for immediate purchases of further storage
Now, if we invoke the 10.5 FP 4 Cancun features, we get this additional benefit
- Actionable compression drastically reduces the footprint of the data storage, thereby extending the life of your existing storage options and obviating the need for immediate purchases of further storage
- Data remains compressed in memory, obviating the need for further memory resources
- Defining the OLAP data as a Shadow Table not only invokes deep compression and uses compressed data in memory, but eliminates the need for manually defined Indexes (and other high performance objects, like MQTs and MDCs), thereby dramatically reducing the data footprint. NB Multi-Temperature Data storage cannot be used in this solution because it demands the use of partitioning and Shadow Tables cannot be based on a partitioned table
Whilst this is a fairly generic and simplistic illustration it does show that the features now available in DB2 confer not only significant performance improvements, but genuine cost savings; either by generating headroom in storage and memory resources that means further purchases are not required for the time being, or by allowing better usage of existing storage and memory resources thereby freeing up existing purchases for use elsewhere in the enterprise.