Meet “the Crusher” - or how we learned to love query acceleration

IBM’s Db2 Analytics Accelerator (formerly called “IDAA”) has been available for over 8 years, as IBM bought Netezza in 2010. However, many Db2 professionals still don’t have exposure to this technology. To provide an overview of our experience with this amazing tool I will describe our approach, some ways to think about Accelerator use cases, and some examples of the resource savings we’ve obtained over the last 3+ years. 

Some background- we are a large financial organization that has used Db2 for over 30 years for both operational and reporting applications. About 25 years ago we created a reporting-only subsystem and have built a variety of data warehouses & data marts in this environment. As it grew with billion row fact tables and scores of dimension tables, accessed by ever changing analytical tools, it became harder to manage. Both from a machine resource and a human resource perspective this environment was a constant headache. Our modest DBA staff struggled to keep up with complex sql using bad access paths from generated sql we could not change, and ETL tools extracting data to move it somewhere else for further processing.

We were fortunate however, as in the spring of 2014 we had the chance to take a free introductory course on IDAA V4, offered by The Fillmore Group in Baltimore (note this class is still offered, see this link for info on the current version class).

We came away impressed, as the Data Studio plug-in for using the Accelerator was easy to use for adding & loading tables, seeing query history, and monitoring activity. The following year we acquired a V4 accelerator, and after a period of experimentation and implementation, we were off to the races.

A big part of our experimentation phase was figuring out which use case was the best fit for our need to offload cycles (general cpu and ziip) and reduce getpages. Keep in mind that the Accelerator has a copy of the data, and after the optimizer routes queries there, Db2 resources are not used to get the result.

Use cases? Yes, from a very general, high level perspective there are at least 4 use cases for exploiting this advanced technology. This is an over simplified, easy way to categorize these options:

  1. Accelerate queries for low-concurrency reporting needs (ie post batch cycle)
  2. Accelerate queries for hi-concurrency reporting or operational needs, via CDC
  3. Data Archival (offline or online)
    and more recently
  4. “HTAP” – hybrid transaction/analytical processing, with no latency via V7 advances

 Our big initial challenge was deciding on how to keep the data current, and how much latency could we tolerate? In V4 the choices were:

  1. batch loads
  2. ChangeDataCapture (CDC) updates from log records
  3. some combination of these 2 options (HTAP was not available at that time)

 We struggled with both options initially, particularly with CDC. In the short term however, we were able to test & build out our data in the Accelerator via the Studio UI feature for loading tables. We did not have any CDC expertise as we didn’t use IBM replication tools in our Db2 environment, but we learned enough to be able to test this feature. We found the replication process cumbersome to administer, but it worked fine.

We ultimately decided on “Use case 1”, the simplest option: load data nightly as part of our long-established batch warehouse update process. However, this required some sort of Accelerator batch load utility.

We were planning to use the provided load (“sample”) program and jcl, but it required a C compiler, which we no longer used. Since we did not initially plan to buy the powerful 3rdparty Loader tool, we realized we had to use a wrapper program to invoke the load functions available via the stored procedure API. Eventually we customized our own Rexx process for this, which we still use. For some details on this see my 2018 Idug presentation here.

Why “The Crusher”? 

From the very first day we were able to run test queries in our environment we have been astounded at what this tool can do. After loading some initial test data, we proceeded to run this as our first test:

Set current query acceleration= all;
Select count(*) from tb1,tb2;

What could be simpler?

It produces a Cartesian product that forces the Db2 optimizer to use huge amounts Db2 workfiles or perform many thousands of redundant table scans. Except Db2 shared resources were not being used, and the Accelerator produced the answer in seconds- with no Db2 cpu, getpages or Db2 workfiles. Hence the nickname.

Using a completely different architecture from Db2 with parallel processors, parallel disk arrays and specialized cpu engines, read-only query performance now looked spectacular. This does not even consider the newer advances with the V7 technology. 

Some examples of results from real queries I’ve extracted randomly from our Accelerator and run as Db2 warehouse queries, are as follows:

Example 1: A 40 line query to sum and group 2 years of data ran for 1 hour in Db2from TSO before I killed it. In that time, it ran 3 parallel threads, each doing work. The parent thread used 7 minutes of cpu, about 12M getpages, and did over 600,000 prefetch requests. Each child thread used 40 minutes of cpu and consumed 44M getpages. In total they used 87 minutes of cpu and used about 100 million getpages and did not produce a result yet!

In the Accelerator this sql returned 150 rows in 2 min 38 seconds elapsed time, and less than .1 second Db2 cpu time. About 500 getpages were used to fetch & return the result to TSO.

Example 2: I killed this query at 15 minutes- another rollup from a large fact table with joins and date predicate filtering. It also ran parallel in Db2, racking up 10 million getpages in 40 cpu seconds, but with 1.7 million synchronous IOs.

In the Accelerator the 10 row result came back in 4 seconds, using 200 getpages to fetch the result. 

Example 3: A 1 month rollup from a large fact table, with an outer join, several inner joins, a subquery and a group by. I killed this one at 15 minutes, also running with a child thread in parallel. It had racked up 11.5 million getpages, in 1 minute of Db2 cpu time with 1.9 million synchronous IOs and 750,000 prefetch requests.

In the Accelerator the 29 rows returned took 1 second and used 250 getpages for the result.

As you can see by these incomplete examples, the Db2 resources saved add up quickly. This is not to say we never have query issues in the Accelerator, as we do occasionally have bad ”runaway” queries that must be killed. Overall though the machine resources saved and DBA tuning time saved have proven the Accelerator to be a very good investment for us.

1 Comment
2 Likes

Think big

July 18, 2019 07:14 AM by Gregory Debo

As a large IDAA user, I love to see these stories of huge performance gains, but I challenge everyone to think bigger.  With some of the newer features in V5 (and hopefully V7 soon) like Federation you can now do things in IDAA that you can't even do in Db2.  So in situations like this where you are one of the companies like us that previously copied all that data to a Data Lake to federate it, you can now do that without leaving the platform, reduce cost, reduce risk, reduce latency and provide a much quicker solution for your users.

Recent Stories
Lightning Fast I/O via zHyperLink and Db2 for z/OS Exploitation

Early stage predicate evaluation with DECFLOAT and implicit casts

Meet “the Crusher” - or how we learned to love query acceleration