SQL Pagination – Data-Dependent (New Syntax) and Numeric-Based

Posted By: Tony Andrews Technical Content,

SQL Pagination – New Syntax

PROBLEM STATEMENT

One of the oldest problems in Db2 performance was the efficient processing of pagination. Pagination is a programming paradigm in which an online application needs to process an indeterminate number of rows, but does not want to fetch them all in before presenting them to the user. Instead, it fetches a fixed number, returns these to the user, and then pages forward at the user’s request. In particular, this is a best practice for CICS transactions (browser based applications present a new set of challenges because the UI makes it easy for the user to scroll forward, rather than page forward).

The second problem with pagination is that the correct SQL code is somewhat cumbersome and error prone.

HOW DO WE CODE PAGINATION?

The typical SQL to read the next set of rows would be to open a cursor similar to:

SELECT cols FROM T1
WHERE (KEY1 = :LAST_KEY1 AND KEY2 > :LAST_KEY2   
                 OR KEY1 > :LAST_KEY1)
ORDER BY KEY1, KEY2
 FETCH FIRST 20 ROWS ONLY  OPTIMIZE FOR 1 ROW

 

The performance issues prior to Db2 z/OS V10 were due to the processing of an OR between 2 predicates. This prevented efficient single index access. There are variations to this code that gave slightly better performance, but not good enough. Db2 V10 solved this problem (actually the smart guys on the Db2 Optimizer team solved it), by introducing SQL Pagination. This is also known as Range List Index Access. The Optimizer now recognizes queries that have multiple predicates against the same table and can use single index access with multiple ranges.

Although the performance is now what you would like it to be, the syntax for this query is not so intuitive, especially for inexperienced programmers. It gets even worse with more than 2 key columns – the number of predicates grows quickly. When you think about the requirement, you are thinking “start with the next key value and then go forward in the index”. Db2 V12 introduces a new syntax for pagination that says pretty much that, and is much simpler to use:

Data–Dependent Pagination

WHERE (KEY1, KEY2) > (:LAST_KEY1, :LAST_KEY2)

 

This is called a row-value-expression. Not only is this easier to code, but it reads more intuitively (once you get used to it). You can also use >= if there are duplicates with the set of key values. And, if you need to scroll backwards, you can use < or <<<==<==<====. 

Db2 will take this syntax and rewrite it (during query rewrite) to the OR syntax, which does get processed efficiently today.

Here is an example (using the Db2 z/OS catalog) with a 3 column key. The goal is to scroll through all of the columns of all of the tables in the database:

SELECT TBCREATOR, TBNAME, NAME, COLNO, COLTYPE, LENGTH, SCALE, NULLS
 FROM SYSIBM.SYSCOLUMNS
 WHERE (TBCREATOR = :LAST_TBCREATOR AND TBNAME = :LAST_TBNAME AND NAME > :LAST_NAME)
        OR  (TBCREATOR = :LAST_TBCREATOR AND TBNAME > :LAST_TBNAME)
       OR  TBCREATOR > :LAST_TBCREATOR
 ORDER BY TBCREATOR, TBNAME, NAME
 FETCH FIRST 20 ROWS ONLY  OPTIMIZE FOR 1 ROW

The Solution Provided in Db2 12 for z/OS and in Db2 LUW

The new syntax is:

SELECT TBCREATOR, TBNAME, NAME, COLNO, COLTYPE, LENGTH, SCALE, NULLS
FROM SYSIBM.SYSCOLUMNS
WHERE (TBCREATOR, TBNAME, NAME) > (:LAST_TBCREATOR, :LAST_ TBNAME, :LAST_NAME)
ORDER BY TBCREATOR, TBNAME, NAME
FETCH FIRST 20 ROWS ONLY OPTIMIZE FOR 1 ROW

 

Note: it is important to use the OPTIMIZE FOR clause to tell Db2 to assume you are only going to fetch a small number of rows. Otherwise, the result set could potentially be very large and Db2 may choose a different access path that would be better for a large result set. In the case of pagination, you want an access path that uses the index and does not do any sorting so that Db2 does not have to access every qualifying row. On the other hand, when you are planning to fetch all qualifying rows, then you want Db2 to pick the best access path for that pattern and you would not want to use the OPTIMIZE FOR clause.

Numeric-Based Pagination

 

There is another new way in Db2 12 and in Db2 LUW to fetch the next set of rows. This is the OFFSET clause:

OFFSET 20 ROWS FETCH FIRST 20 ROWS ONLY

 

This tells Db2 to skip the first 20 rows and then start fetching. This of course is better than the application fetching from the beginning of the rows and discarding the first 20. If done by the application, there would be 40 fetches for the second page, 60 fetches for the third page, and so on. With the OFFSET clause, the skipping is done by Db2 and there are only 20 fetches and 20 rows returned to the application. However, even though the earlier rows are not returned to the application, the still have to be read by Db2 and therefore the larger the offset, the longer it will take. This syntax is very simple and the larger the offset, the faster the result compared to the application discarding rows. But, it is not nearly as fast as the new data dependent syntax which takes full advantage of the index to quickly get to the first qualifying row.

The specified offset can either be a numeric literal or a variable (anything castable to BIGINT). The ability to use a variable makes it easy for the program to page forward repeatedly. You would normally also use an ORDER BY (the key columns) and the FETCH and OPTIMIZE clauses to ensure that the same access path was used for each iteration.

OFFSET is important for compatibility with other DBMS (including Db2 LUW). Other than that, it has the advantage of simpler code. Instead of a program keeping track of the set of key values of the last row fetched for the previous page, it only has to keep track of how many rows it has fetched and increment an OFFSET host variable. Simpler, quicker coding is nice, but since the performance is not as good as data-dependent pagination, it is better in most cases to use that method. The new row-expression syntax for pagination makes this only slightly more effort that using OFFSET. Another big disadvantage of OFFSET is that even with an ORDER BY and the same access path, if another transaction has inserted or deleted rows which you have already processed, the next page may repeat or miss a row.