SQL Sparse Indexing for Db2 z/OS

Sparse indexing has been around for a number of versions. The history of these on the z/Os side goes back to originally helping with non-correlated subqueries in V4, then star joins in V7, to hashing support of these indexes in V10.  

Since V10, I have seen more and more use of sparse indexes, particularly in conjunction with nested loop joins to improve SQL performance. This is what it looks like in a visual explain graph. 

SELECT LASTNAME, DEPTNAME
FROM EMP E
JOIN DEPT D
  ON E.DEPTNO = D.DEPTNO
WHERE E.SALARY > 45000
;

Sparse-image1.png

In this example, both tables have an index on the DEPTNO column and a typical nested loop join would use the index to probe each DEPTNO value (executing GETPAGES within the index) and then executing GETPAGES to check other predicates in the tablespace.

Sparse indexing works very much like hash joins that we often see in Db2 LUW joins. In this example, Db2 has chosen the (DEPT) as the inner table. The inner table is read first applying any local predicates and Db2 building the sparse index. The sparse index result from the inner table is built in memory if the optimizer thinks it can fit (controlled by zParm MXDTCACH, default is 20MB). The outer table (EMP) is then processed applying its local predicates, with the join column(s) being hashed and then the lookup in the sparse index.  

Note: Performance may degrade when the sparse index does not fit into the allocated memory and spills into a work file.  If this happens regularly then MXDTCACH should be increased.

So why the sparse index here? The optimizer does some costing to determine if there will be enough rows from the outer table qualified to payback the cost of building and hashing the sparse index on the inner table. In this example, it looks like Db2 is guessing that 17,630 lookups to the hash table will take place. The hashing of values from the outer table and the subsequent lookup (matching) in the sparse index is very fast and outweighs any probing of an index from the inner table.     

Sparse indexes are shown in the visual explain by the SIXSCAN node (sparse index scan) with the estimated number of rows going into the sparse index shown by then number on the WORKFILE node. In this example 113 entries are estimated.  

It is also noted in the PLAN_TABLE as PRIMARY_ACCESSTYPE = ‘T’. For this example, the Plan_Table output looks like this: 

PLANNO  METHOD  CREATOR   TNAME      ACCESSTYPE  PRIMARY_ACCESSTYPE
------  ------  --------  ---------  ---------- ------------------
   1       0  THEMIS81  EMP              R      
   2       1   THEMIS81  DEPT              R                   T                 

‘T’ Means: The base table or result file is materialized into a work file, and the work file is then built and accessed via sparse index built at runtime.  

I have also seen sparse indexes built for materialized tables specific to nested and common table expression. Without a sparse index built, the optimizer would typically default to a merge scan join between the two tables or use the nested table as the outer table in another join. If Db2 determines the materialization of the table expression fits in memory, and for reasons stated above it builds a sparse index on the work file.   

SELECT D.DEPTNO, D.DEPTNAME,
  E.NUM_EMP, E.TOT_SAL
FROM THEMIS81.DEPT D,
( SELECT DEPTNO,
COUNT(*) AS NUM_EMP,
SUM(SALARY) AS TOT_SAL
FROM themis81.EMP
GROUP BY DEPTNO ) AS E
WHERE D.DEPTNO = E.DEPTNO
;

Sparse-image2.png

I have also seen more sparse indexing in the materialization of work files with non-correlated subqueries. In the following example, the optimizer decides to put the values into a work file, and then build a sparse index on that file. In this example, it thinks only 107 entries will be needed in the sparse index.  

SELECT DEPTNO
FROM THEMIS81.DEPT
WHERE DEPTNO IN
( SELECT DEPTNO
FROM THEMIS81.PROJ
WHERE PROJNO LIKE 'M%' )
;

Sparse-image3.png

In summary, I have seen only good results from the optimizer helping the queries by building a runtime sparse index. On Db2 LUW these types of queries typically result in hash joins.

1 Like
Recent Stories
IBM Db2 Analytics Accelerator can now be deployed on IBM Z

Modern indexes for modern data

Bring Intelligence to Where Critical Transactions Run – An Update from Machine Learning for z/OS