Temporary Tables in Db2 (along with a painful lesson)

Posted By: Brian Laube Technical Content,
Introduction: What is a Temporary Table

A temporary table (in most RDMS) is a place to keep data for the duration of some application process. The data is basically private and only available to the application user.

The typical end-user of a temporary table ranges from simple ad-hoc dynamic end-users (like you or me) up to real application programs (native SQL stored procedures or COBOL). 

Temporary tables are used in dynamic and static SQL. 

A temporary table is a place to store relevant to your needs. Whether you are an application or some end-user. It is a place to keep data. It helps give SQL coding options. They can help reduce SQL complexity! You can break down potentially complex, confusing or large SQL into smaller SQL chunks via temporary tables.

In a sense, by breaking big SQL into multiple smaller SQL referencing a temporary table.

Hopefully using the temporary table and having several separate SQL then you make the overall SQL functionality easier to understand. 

And you have some control of access path… you almost force or encourage DB2 to go in the order you specify because of your SQL chunks.

Examples of use of a temporary table:

  • a high-volume online transaction might need to hold working data in a temporary table for a short time to make the transaction logic simpler
  • a low-volume batch reporting application might have complex reporting requirement which might be simply implemented by using a temporary table
  • and sometimes we end users (DBA or developer or support) have some totally dynamic and ad-hoc complex query requirement where a temporary table provides a simple way to hold data to make a multi-step process easier.
    • A temporary table can be used dynamically (in SPUFI, batch DSNTEP2, or workstation tools like Data Studio) to hold data and make very complex almost procedural queries to a series of tables. This can help YOU avoid writing a program! Very handy

The IBM knowledge center provides excellent documentation on HOW to use temporary tables. This article summarizes the HOW but emphasizes, the WHEN to use and in what scenario. And importantly, the real-life example below highlights the pros and cons of both types of temporary tables.


CGTT – CREATED GLOBAL TEMPORARY TABLES

Of course, the IBM knowledge center has the proper and complete reference for CGTT: (link to DB2 V12 knowledge Center for CGTT)

The CGTT is created once (often by the DBA) in advance… and then other (applications) can use it. Authorization to create a CGTT is required!

Each application has its own instance of created temporary table while it is using it! The data is private to the user of the instance of the CGTT.

A CGTT instance exists when an application uses it by referencing it in OPEN, SELECT, DELETE or INSERT. Most likely you will use INSERT to stuff it with data the first time!

The instance of the temporary table will exist until the application process ends OR The remote server connection terminates, OR the unit of work completes. ROLLBACK or COMMIT.

  • With COMMIT, Db2 will delete the instance of the temporary table UNLESS it is referenced by an OPEN cursor that was defined ‘WITH HOLD’.

Each application schema/database/environment has its own version of the CGTT. Remember, they are created in advance! The same CGTT name could theoretically be different in different schemas. The CGTT could change over time like any other table. You would migrate the changes up as the CGTT changes.

  • DB2 does not perform locking and locking operations for CGTT
  • You cannot use MERGE with CGTT.
  • You can use DELETE ... but without WHERE clause. Basically, you can only use DELETE to empty the CGTT
  • No UPDATE of Temporary Table.
  • Basically, use INSERT to get data inside. You can use SELECT INTO. 
  • SELECT is used to query the CGTT. Or OPEN cursor
  • No index allowed on CGTT.

If you are truly finished with a CGTT and no application program or end-user uses it anymore… then you must DROP TABLE to get rid of it!

Sample DDL for a CGTT:

CREATE GLOBAL TEMPORARY TABLE DBCDBD1.CGTT01
( tbcreator varchar(128)
, tbname varchar(128)
, name varchar(128)
, colno smallint)  ;


DGTT – DECLARED GLOBAL TEMPORARY TABLES

Again, the IBM knowledge center is the official place for documentation for DGTT: (link to DB2 V12 Knowledge Center for DGTT) 

The DGTT is defined or declared as it is needed by an application or end-user. No special authorization is required to use DGTT.

The DGTT is not stored in the DB2 catalog. It is not persistent. It cannot be shared. You DECLARE and then you use it!

  • If some other process/person wants to use the ‘same’ DGTT then they must declare it when they need it!

The DGTT is unique to the application/user that defined it. 

  • Theoretically, multiple parallel users could define the same DGTT with the same name… but they only use their own DGTT!

Subsequent references to the DGTT name must use “session” as the schema. 

When the application process terminates, the temporary table is implicitly dropped.

  • A DGTT can have zero or one or many indexes.
  • A DGTT allows UPDATE.
  • On COMMIT the DGTT can be auto-emptied (optionally, also drop the DGTT) OR on COMMIT, the DGTT can preserve the ROWS.
  • A DGTT can be logged or not logged (as of V11). If not logged, then on rollback it can delete the rows or preserve the rows.

Sample SQL for a DGTT: 

DECLARE GLOBAL TEMPORARY TABLE DGTT01
( TBCREATOR VARCHAR(128)
, TBNAME VARCHAR(128)
, NAME VARCHAR(128)
, COLNO SMALLINT)
on commit drop table
-- on commit delete rows -- this would be default
-- on commit preserver rows
logged -- default behaviour
--not logged on rollback delete rows
--not logged on rollback preserve rows
;
-- latter reference to this DGTT will use prefix "session" as in "session.DGTT01"


An example of mis-use of DGTT vs CGTT

At a high level both types of temporary tables provide a simple temporary table functionality for simple SQL needs. 

And at a super-high level, DGTT seems much easier than CGTT. Perhaps even “better” (although one should always remember… “it depends”). 

No DBA required for DGTT. DGTT allows indexes. You can UPDATE a DGTT. What is not to love?

And the simplest examples of using a temporary table are ad-hoc dynamic queries that a DBA might come up with at his desk. 

And in that case, the DBA might use DGTT because they are easy and can be re-done over and over until the SQL script is perfected. 

And then the DBA might tell a bunch of his favorite developer-buddies about DGTT and CGTT. With an emphasis about how DGTT is pretty cool and easy for said developers to use! No need to bother the DBA. The DBA likes that.

And then a few years might past. The DBA continues to use DGTT but is unaware how much the developers are using temporary tables in applications. As far as the DBA knows, the usage of temporary tables is limited and non-existent by the applications and developers.

One day, an exciting and new application is developed in-house and implemented. It goes to prod. It is a futuristic and modern application almost entirely written with DB2 native SQL stored procedures.

After a few days, we realized that the CPU life was being sucked out of our mainframe. 

Some investigation showed the culprit was this new application.

Further investigation quickly led to the new and high-cost procedures. (a combination of online monitor thread history PLUS my DB2 performance database made investigation easy)

A quick review of these top 3 or 5 procedures showed source code with some less-than-ideal-and-crazy-complex SQL (stupid-stupid SQL) along with a reliance of DGTT.

  • In my case of this application. The overly complex (subjectively called stupid-stupid) SQL was out of scope for my review. Apparently, the SQL did what was required. 

There were several DGTT used to gather data from various application tables. They stuffed the data into the DGTT (and counted the rows inside the DGTT) and then returned the contents of the DGTT to the caller (via open cursor).

Apparently, the project got through “acceptance testing” with no complaints. They never asked the DBA to review code or application performance measurements.

In “acceptance” the response time and cpu time were well under a second. It seemed fine. People hit enter and the response came back promptly. BUT, the math would say that doing the same thing thousands of times per day would add up to something significant. No one did the math.

Now we needed to do something.


The Cons of DGTT (especially in this case)

The DGTT is an extra SQL statement. It must be executed every time it is needed. So every time you DECLARE your DGTT, you issue a SQL statement. Declaring it once is not bad. But doing it thousands of times adds up.

The DGTT could be inside a static program (COBOL or native SQL stored procedure). The layout of the DGTT is not really known (because it is not guaranteed) at CREATE PROCEDURE or BIND PACKAGE time. Therefore, importantly and expensively, those SQL that use the DGTT must be dynamically bound at execution time for DB2 to figure the optimal access path. Every SQL that references the DGTT must have the access path decided at run time. Doing this once is not expensive. But doing this thousands of times really adds up. And it is probably not worth the cost because it is likely to be the same access path every time!

Together, these two little things add up. 

We had many procedures that used this concept.


The Pros of DGTT (in this case)

The DBA created a CGTT. The procedure was modified to use the CGTT instead of the DGTT. The application logic did not change at all. The actual SQL was not changed, although it could have used some review and possible, that was beyond the immediate scope. 

Switching to CGTT resulted in a 96% reduction of CPU in this one procedure. It went from something noticeable to something almost costless. 

The rest of the high cost and high volume procedures using DGTT where changed to CGTT and the overall application CPU cost went from hours to reasonable minutes. Thankfully.

The chart below shows the change in cost of this real-life procedure before and after the switch from DGTT to CGTT.

The PKG_ALLOC and CNT (count of accounting trace in the DB2 performance database) show that the procedure package usage remained relatively consistent over the relevant time period. But the total CPU_MIN and CPU_AVG show the obvious drop in cost. The SQL went down because the procedure had fewer SQL in the procedure. The CPU_MAX is a bit high because some odd input data cases would cause the SQL search to be expensive (that is another issue).

DT    SSID PCK_ID                      PLAN_NAME CPU_MIN CPU_MAX   CPU_AVG  PKG_ALLOC SQL    CNT
----- ---- --------------------------- --------- ------- --------- -------- --------- ------ -----
03-01 PDBB SRCH_FOR_PRVDR              YTCIRRP1      1.2  0.193633 0.005655    147057 248937 13315
02-28 PDBB SRCH_FOR_PRVDR              YTCIRRP1      1.2  0.150026 0.005225    154246 260650 14285
02-27 PDBB SRCH_FOR_PRVDR              YTCIRRP1      1.2  0.191723 0.005337    153097 260189 14147
02-26 PDBB SRCH_FOR_PRVDR              YTCIRRP1      1.2  0.135898 0.005248    161023 266483 14606
02-23 PDBB SRCH_FOR_PRVDR              YTCIRRP1      1.3  0.168243 0.005679    156703 264599 13953
02-22 PDBB SRCH_FOR_PRVDR              YTCIRRP1     30.6  0.684351 0.134640    160194 468635 13645
02-21 PDBB SRCH_FOR_PRVDR              YTCIRRP1     29.8  0.634492 0.125066    165294 462562 14301
02-20 PDBB SRCH_FOR_PRVDR              YTCIRRP1     28.2  0.609018 0.118179    156760 440831 14367
02-19 PDBB SRCH_FOR_PRVDR              YTCIRRP1     14.2  0.573073 0.099820     88111 239414  8538
02-16 PDBB SRCH_FOR_PRVDR              YTCIRRP1     29.3  0.598881 0.126219    158599 455446 13936
02-15 PDBB SRCH_FOR_PRVDR              YTCIRRP1     30.3  0.667659 0.125824    161636 467731 14484
02-14 PDBB SRCH_FOR_PRVDR              YTCIRRP1     28.5  0.580629 0.114303    162520 446162 14963
02-13 PDBB SRCH_FOR_PRVDR              YTCIRRP1     29.7  0.664645 0.118736    162726 458787

In hindsight, the lesson learned, is that CGTT are well designed for

  • small result sets (no need for index)
  • being used over and again. Especially in a high-volume transactional application. CICS or WEB
  • static programs
  • no need for UPDATE of temporary table contents

And DGTT are best designed for

  • Ad-hoc query or reporting.
    • DGTT can help break up complex reporting requirements into multiple smaller and easier-to-understand SQL steps
  • Low-volume regular reporting (yearly, monthly or even daily > but probably not thousands of times per day)
  • Large results (or small)
  • Results that require updates
  • Results that would benefit from an index for later queries.
  • A unique index can guarantee uniqueness
  • Easy to use by anyone -> no need to bother the DBA

One final and extra thought about using any type of temporary tables to gather data in order to return it via a cursor. It is probably best, if possible, to avoid gather data into a temporary table and then returning the contents of the temporary table via a standard open cursor and the result set. Just sending the result set to the caller is typically faster. 

But, you might use temporary tables for other reasons. Perhaps the required SQL logic is complex and using temporary tables improves readability and maintenance of the SQL. Or perhaps you don’t want to use temporary table to release the locks on the source tables as the data is being passed back to the caller.


Example of using DGTT and CGTT with cost comparison

Further below, are source code examples of two simple procedures. One uses CGTT and the other DGTT. They work off the DB2 catalog, so these samples basically work anywhere.
The procedures are logically and functionally the same. 

The one using the CGTT required bothering the DBA to create the procedure in advance. 

Each procedure was called a hundred times and then the average CPU cost was calculated

Insert 10 rows into the TT and count the rows and return the rows.

  • call dbcdbd1.tt_test_cgtt() - avg CPU 0.0004
  • call dbcdbd1.tt_test_dgtt() - avg CPU 0.0080

The procedure with DGTT used 20 times more cpu. This is meant to simulate a very simple and fast transaction that has a need for a temporary table. Small result set.

Insert 999 rows into the TT and count the rows and return the rows. 

  • call dbcdbd1.TT_TEST_CGTT999(?) - avg cpu 0.004
  • call dbcdbd1.TT_TEST_DGTT999(?) - avg cpu 0.018

The procedure with DGTT used 4.5 times more CPU. This is meant to simulate a medium complexity transaction that has a need for a temporary table. Large result set.

The two comparisons above emphasize that there is no fixed rule for the benefit or cost of CGTT vs DGTT. As always, it depends.


Sample procedure using CGTT
--CREATE GLOBAL TEMPORARY TABLE DBCDBD1.CGTT01
--( tbcreator varchar(128)
--, tbname varchar(128)
--, name varchar(128)
--, colno smallint)
--;
CREATE PROCEDURE TT_TEST_CGTT
                (OUT OUT_CNT_OF_ROWS_TABLE INTEGER)
    VERSION V1  LANGUAGE SQL  APPLCOMPAT V11R1 NOT DETERMINISTIC  
        DYNAMIC RESULT SETS 1 -- QUANTITY OF OPEN CURSORS AFTER EXECUTION 
        QUALIFIER DBCDBD1 -- FOR UNQUALIFIED DB2 OBJECTS (SUCH AS TABLES!)
        PACKAGE OWNER DBCDBD1 -- OWNER OF PROCEDURE ... FOR AUTHORITY
        ISOLATION LEVEL CS
    WLM ENVIRONMENT FOR DEBUG MODE DDBCDBG1 -- IF DEBUGGING, THEN debug here
    ASUTIME LIMIT 800000 -- REASONABLE LIMIT ON SERVICE UNITS (CPU) 
P1: BEGIN

--  DECLARE CURSOR
DECLARE CUR1 CURSOR WITH RETURN FOR
 SELECT * FROM CGTT01
 ;

-- INSERT INTO
INSERT INTO CGTT01
(TBCREATOR, TBNAME, NAME, COLNO)
SELECT TBCREATOR, TBNAME, NAME, COLNO
FROM SYSIBM.SYSCOLUMNS
WHERE 1=1
  AND RAND() < 0.50
FETCH FIRST 0010 ROWS ONLY
;
-- COUNT ROWS IN
SELECT COUNT(*) INTO OUT_CNT_OF_ROWS_TABLE FROM CGTT01;
-- OPEN CURSOR
OPEN CUR1;
 
END P1#

Sample procedure using DGTT
CREATE PROCEDURE TT_TEST_DGTT
                (OUT OUT_CNT_OF_ROWS_TABLE INTEGER)
    VERSION V1  LANGUAGE SQL  APPLCOMPAT V11R1 NOT DETERMINISTIC  
        DYNAMIC RESULT SETS 1 -- QUANTITY OF OPEN CURSORS AFTER EXECUTION 
        QUALIFIER DBCDBD1 -- FOR UNQUALIFIED DB2 OBJECTS (SUCH AS TABLES!)
        PACKAGE OWNER DBCDBD1 -- OWNER OF PROCEDURE ... FOR AUTHORITY
        ISOLATION LEVEL CS
    WLM ENVIRONMENT FOR DEBUG MODE DDBCDBG1 -- IF DEBUGGING, THEN debug here
    ASUTIME LIMIT 800000 -- REASONABLE LIMIT ON SERVICE UNITS (CPU)
 
P1: BEGIN
 
-- DECLARE DGTT
DECLARE GLOBAL TEMPORARY TABLE DGTT01
( TBCREATOR VARCHAR(128)
, TBNAME VARCHAR(128)
, NAME VARCHAR(128)
, COLNO SMALLINT)
on commit drop table
;
--  DECLARE CURSOR
BEGIN
 DECLARE CUR1 CURSOR WITH RETURN FOR
  SELECT * FROM SESSION.DGTT01
 ;
-- INSERT INTO DGTT
INSERT INTO SESSION.DGTT01
(TBCREATOR, TBNAME, NAME, COLNO)
SELECT TBCREATOR, TBNAME, NAME, COLNO
FROM SYSIBM.SYSCOLUMNS
WHERE 1=1
  AND RAND() < 0.50
FETCH FIRST 0010 ROWS ONLY
;
-- COUNT ROWS IN DGTT
SELECT COUNT(*) INTO OUT_CNT_OF_ROWS_TABLE FROM SESSION.DGTT01;
 
-- OPEN CURSOR
 OPEN CUR1;
END;

END P1#

A few final thoughts on DGTT and LOGGED vs NOT LOGGED

This is not exactly related to the previous topics… but it is an interesting consideration when defining a DGTT.

A DGTT allows you to optionally specify LOGGED vs NOT LOGGED (since V11)

AS usual with DB2, the default is LOGGED. Logging is generally good.

The NOT LOGGED will have a performance benefit when you populate the DGTT. You avoid writing to “log” and the CPU cost of writing to the “log”. The benefit is not as much as one might expect. Especially CPU. But it is something. But if it is many many rows then you will definitely save LOG space. You will avoid filling up the “log” with low-value log records.

An interesting caveat is about when the DGTT has a unique index.

If you get halfway through INSERT data into the DGTT and receive SQLCODE -803 for attempting to insert a row that will violate the unique index constraint, then the DGTT will be emptied if it is NOT LOGGED. If it is LOGGED, then the DGTT will continue to hold the previous rows. Be careful of this difference in behavior!

In general, one should use the default of DGTT of “LOGGED” unless one has a specific reason to use NOT LOGGED. Think about it!