Db2 Recursive SQL in the Real-World

Introduction

You can find a ton of articles about recursive SQL dating back as early as 2004, but for this article I was asked to provide actual real-world situations where recursive SQL was put to use. In all of the situations it was a great success! First, however, a refresher.

Recursive SQL in Db2 is enabled with a self-referencing common table expression. A common table expression is like a temporary view used for the duration of a SQL statement. Here is a primitive example.

WITH TABLE1(COL1) AS              -- Common table
(SELECT 1 FROM SYSIBM.SYSDUMMY1) -- Expression
SELECT COL1 FROM TABLE1; -- Reference is common table expression
                          -- (this is all one SQL statement)

The WITH clause identifies the table name and column names, and the SQL expression within the parenthesis identifies the common table, which can then be referenced in subsequent expressions similar to any other table references.

Db2 recursive SQL is not true recursion in that it’s not enabled by a function that invokes itself, but rather a common table expression that references itself. So, it is much more a looping mechanism than true recursion, and thus when coding these types of common table expressions certain rules should be followed to avoid a SQL warning and infinite loop. A column should be added as a counter, and a stop predicate should be utilized to avoid an infinite loop, preferably against the counter column. Here is a basic recursive statement that produces a count from 1 to 10.

WITH COUNTER_TABLE (N) AS
(SELECT 1 FROM SYSIBM.SYSDUMMY1
UNION ALL
SELECT N+1 FROM COUNTER_TABLE    -- Self-reference
WHERE N < 10)                    -- Stop predicate   
SELECT N FROM COUNTER_TABLE;

Db2 recursion is most useful for traversing networks and hierarchies within a database schema, such as a bill of materials table. It’s also useful for generating test data and statements. You can find more examples of recursive SQL in articles by following these links.

http://ibmsystemsmag.com/blogs/db2utor/archive/recursive-sql-s/ 
http://www.craigsmullins.com/bp5.htm
https://www.idug.org/p/fo/et/thread=47420

If you have IDUG Premier membership you can also view many presentations regarding recursive SQL. The Db2 manuals also have some excellent examples. What follows here are some real-world examples of the use of recursive SQL in my experiences at various customer sites.

Key Generation

A Db2 for Linux customer had a problem with the batch performance of their fraud detection application. They had two ways of accepting input into their system, one of which was via online single transactions and the other view file submission to their nightly batch application. The application server and the database server were on separate physical machines. The main identifier for their primary entity was generated via a Db2 sequence object. They used a statement such as this to generate the sequence value prior to processing the data.

VALUES (SEQ1);

While perfectly effective for the only addition of a single transaction’s worth of data. It proved to be a bit of a bottleneck during batch processing of thousands of inputs due to the fact that every execution was a network call. They had contemplated increasing the increment value of the sequence object, but then that would waste significant value when used online. The solution was a recursive statement.

WITH GEN_KEYS(N) AS
(VALUES (1)
UNION ALL
SELECT N+1 FROM GEN_KEYS
WHERE N < 1000)
SELECT NEXT VALUE FOR SEQ1
FROM GEN_KEYS;

 This statement, when used in batch, would generate 1,000 sequence values and because they were using block fetching that were able to get all 1,000 values in a single network call.

Traversing Relationships in the Db2 System Catalog

Sometimes you need to find the relationships between objects in a database schema. Recently a customer of mine decided that they needed to purge old data in the database. They were not prepared or willing to write an application to do the purge, and so they asked the DBAs to develop a purge script instead. One of the challenges in setting up the script was to make sure that deletes of data occurred in the proper sequence relative to database enforced referential integrity. This statement helped with that investigation. The commented-out portions are an attempt to reduce redundancy in the final output, but be aware that they can eliminate circular references. A circular reference is one in which a table references itself via foreign key, or a table has a foreign key referencing another table which, in turn, has a foreign key reference back to the first table. This type of relationship is illustrated in the next section about billing application hierarchy. So, uncomment out those sections of following SQL only after understanding the relationships that are displayed in the output.

WITH RELS (LEVEL, CREATOR, TBNAME, REFTBNAME, REFTBCREATOR, NETWORK) AS
(SELECT 1, CREATOR, TBNAME, REFTBNAME, REFTBCREATOR
       , CAST(REFTBNAME || '<--' || TBNAME AS VARCHAR(2000))
  FROM SYSIBM.SYSRELS
  where creator = 'DSN81110'
  AND   (REFTBCREATOR, REFTBNAME) NOT IN
        (SELECT CREATOR, TBNAME
         FROM SYSIBM.SYSRELS
         WHERE  CREATOR != REFTBCREATOR
         AND    TBNAME  != REFTBNAME)
UNION ALL
SELECT R.LEVEL + 1, S.CREATOR, S.TBNAME, S.REFTBNAME, S.REFTBCREATOR
       , CAST(R.NETWORK || '<--' || S.TBNAME AS VARCHAR(2000))
  FROM       SYSIBM.SYSRELS S
  INNER JOIN RELS R
  ON         S.REFTBNAME  = R.TBNAME
  AND        S.REFTBCREATOR = R.CREATOR
  WHERE      R.LEVEL < 10
  AND        S.REFTBNAME || S.REFTBCREATOR != S.TBNAME || S.CREATOR
  AND        LOCATE(S.TBNAME,R.NETWORK) = 0
)

SELECT DISTINCT X.LEVEL, X.CREATOR, X.TBNAME,
X.REFTBNAME, X.REFTBCREATOR, X.NETWORK
FROM   RELS X
--WHERE   NOT EXISTS
--       (SELECT 1
--        FROM   RELS Y
--        WHERE  LOCATE(X.TBNAME,Y.NETWORK) > 0
--        AND    Y.LEVEL > X.LEVEL)
--OR     NOT EXISTS
--       (SELECT 1
--        FROM   RELS Y
--        WHERE  LOCATE(X.REFTBNAME,Y.NETWORK) > 0
--        AND    Y.LEVEL > X.LEVEL)
ORDER BY X.LEVEL, X.CREATOR, X.TBNAME, X.REFTBNAME, X.REFTBCREATOR; 
WITH RELS (LEVEL, TABSCHEMA, TABNAME, REFTABSCHEMA, REFTABNAME, NETWORK) AS
(SELECT 1, TABSCHEMA, TABNAME, REFTABSCHEMA, REFTABNAME
       , CAST(REFTABNAME || '<--' || TABNAME AS VARCHAR(2000))
  FROM SYSCAT.REFERENCES
  where TABSCHEMA = 'DB2INST1'
  AND   (REFTABSCHEMA, REFTABNAME) NOT IN
        (SELECT TABSCHEMA, TABNAME
         FROM SYSCAT.REFERENCES
         WHERE  TABSCHEMA != REFTABSCHEMA
         AND    TABNAME  != REFTABNAME)
UNION ALL
SELECT R.LEVEL + 1, S.TABSCHEMA, S.TABNAME, S.REFTABSCHEMA, S.REFTABNAME
       , CAST(R.NETWORK || '<--' || S.TABNAME AS VARCHAR(2000))
  FROM       SYSCAT.REFERENCES S
  ,          RELS R
  WHERE         S.REFTABNAME  = R.TABNAME
  AND        S.REFTABSCHEMA = R.TABSCHEMA
  AND        R.LEVEL < 10
  AND        S.REFTABNAME || S.REFTABSCHEMA != S.TABNAME || S.TABSCHEMA
  AND        LOCATE(S.TABNAME,R.NETWORK) = 0)

SELECT DISTINCT X.LEVEL, X.TABSCHEMA, X.TABNAME,
X.REFTABSCHEMA, X.REFTABNAME, X.NETWORK
FROM   RELS X
--WHERE   NOT EXISTS
--       (SELECT 1
--        FROM   RELS Y
--        WHERE  LOCATE(X.TABNAME,Y.NETWORK) > 0
--        AND    Y.LEVEL > X.LEVEL)
--OR     NOT EXISTS
--       (SELECT 1
--        FROM   RELS Y
--        WHERE  LOCATE(X.REFTABNAME,Y.NETWORK) > 0
--        AND    Y.LEVEL > X.LEVEL)
ORDER BY X.LEVEL,  X.TABSCHEMA, X.TABNAME, X.REFTABSCHEMA, X.REFTABNAME;

Billing Application Hierarchy

I used to joke “what’s infinity in the database world? The answer, three days”. This was inspired by a billing application on Db2 for AIX. The application basically ingested about 2 million purchases in real-time during the course of a month by about 500,000 customers, and we spent most of our time during development on performance of the real-time transactions. No time was spent on the end of the month generation of the invoices until the application was in production. Then the call came in, “There’s something wrong with Db2. Billing has been running for three days”. A quick review of a snapshot monitor showed the same five statements running over and over again. They looked something like the following.

SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE1;
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE2;
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE3;
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE4;
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE5;

While the statements seemed simple there were 2.5 million of them, and only about 120,000 customers had activity that required an invoice. So, the simple solution was to merge the statements together.

WITH CUSTBILL(CUSTID, AMT) AS
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE1
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE2
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE3
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE4
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE5)

SELECT CUSTID, SUM(AMT) FROM CUSTBILL;

Billing went from a three-day process to a 1-minute process. Then the call came in, “There’s something wrong with Db2. Billing has been running for three days”. A quick review of a snapshot monitor showed the same five statements running over and over again. It turns out that customers were in groups, and a group was assigned to a customer, and that customer could be in a group, and all the money needed to be reported along every level of the customer hierarchy! Here is a simple view of the schema.

pic1.png

The schema should Group, owned by a customer, in a cyclical relationship with Customer, which formed a hierarchy. This is a perfect candidate for recursive SQL.

WITH CUSTBILL(CUSTID, AMT) AS
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE1
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE2
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE3
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE4
UNION ALL
SELECT CUSTID, SUM(AMT) FROM BILLING_TABLE5)
,CUSTGROUP(CUSTID, AMT) AS – second common table expression
(SELECT CUSTID, SUM(AMT) FROM CUSTBILL
UNION ALL
SELECT C.CUSTID, A.AMT
FROM CUSTGROUP A, -- recursive reference to do roll up
      CUST_GROUP B,
      CUSTOMER C
WHERE A.CUSTID = B.CUSTID
AND    B.GROUPID = C.GROUPID)

SELECT CUSTID, SUM(AMT) -- final tally!
FROM CUSTGROUP;

This enabled all money’s to be rolled up in the hierarchy and reported at each level. The examples here have been greatly simplified for easy understanding, but hopefully the explanation is clear. This took an infinite process and turned it into a 3-minute process. Recursion can be a life saver in a circumstance such as this.

Web Site Menu

A Db2 for z/OS customer sold products online. They sold a lot of products online, as much as a million products per day. With this sort of volume, the web site menu is impacted by the constant changes in product inventory. Here is an abstract of what the web menu might look like.

A “PRODUCTS” menu item may have sub-items

  • “MEN’S CLOTHING”
    • “SHIRTS”
    • “SUITS”
    • “PANTS”
  • “WOMEN’S CLOTHING”
    • “DRESSES”
    • “BLOUSES”
    • “SHOES”
  • “CHILDREN’S CLOTHING”
    • “BOYS”
    • “GIRLS”

The data for the web menu was provide view a table.

pic2.png

There was a batch COBOL process which ran several times per day to update the web site with a new menu. It looped through the various levels of the menu, and made decisions based upon product availability. Unfortunately, the process ran in about 8 minutes, and locked out transactions while it was running. Since the table was self-referencing it was the perfect candidate for recursive SQL. The COBOL process was replaced by a single SQL statement. This statement ran in just a few seconds and locked out no transactions whatsoever. This allowed the customer to run the statement much more often, keeping the web menu fresh all day.

MENU has a NODE value of 0, and its children have a PARENT_NODE value of 0, and so on…

 WITH GEN_MENU(NODE, PARENT_NODE, NODE_NAME, NODE_DESC, N) AS
(SELECT NODE, PARENT_NODE, NODE_NAME, NODE_DESC, 1
FROM   MENU_ITEM
WHERE  NODE = 0
AND    ACTIVE_FLG = 'Y'
UNION ALL
SELECT B.NODE, B.PARENT_NODE, B.NODE_NAME, B.NODE_DESC, N+1
FROM   MENU_ITEM B , GEN_MENU A
WHERE     B.PARENT_NODE = A.NODE
AND    N < 20
AND    ACTIVE_FLG = 'Y')
SELECT NODE, PARENT_NODE, NODE_NAME, NODE_DESC, N
FROM GEN_MENU

Summary

Theoretically, recursive SQL is pretty cool. However, from a practical perspective it is extremely useful for specific use cases in which you have a circular reference within your database schema. It is also great for generating data from nothing due to it’s ability to loop within a single statement.

3 Likes
Recent Stories
Early stage predicate evaluation with DECFLOAT and implicit casts

Meet “the Crusher” - or how we learned to love query acceleration

May sum up at IDUG