Travel through time with Db2 SQL to investigate complex application data change questions

Posted By: Brian Laube Technical Content,

The poor beleaguered support staff sometimes come up with the strangest questions.

Why does policy 123 in the POLICY table have status code X. I could have sworn the status code should be Y. How could this be? Who did it? When? Why?

That is actually a very interesting question and the application DBA might be able to help. In our case, the POLICY table has helpful field called “update_timestamp”. But the value for this row is from 10 years ago. Apparently, the application code must maintain this field and we should not always trust it to be updated. The main application programs do maintain this table field value when doing updates. But the occasional application data zap occurs without updating it!

What is the traditional approach to helping support with this challenging data change question?
  • First, we remind them that they should always update their application table field “update_timestamp” when making data changes because the point of it is to help with this type of question! But this is not the time to berate our poor support team. It is probably beyond their control anyways! WE might suspect some data zap occurred and forgot about the application audit fields. But they do not want to know that now. We just need to find a solution today.
  • Second, if they have vaguest idea of when the data change occurred then we could use the trusty log analyzer tool to dig into data change. It is possible with all decent log analyzer tools (from all vendors) to produce a detailed report of data changes based upon very specific criteria (such as the table policy key field or even the status code field). But this log analyzer approach can be difficult if the possible change was at some unknown time in the past… or a very broad time in the past. Searching too much log with very specific criteria is very very time consuming
    • Minor digression: Do NOT get me started on tables without DATA CAPTURE CHANGES… at the very least, all my “important” application tables have DATA CAPTURES CHANGES enabled… it makes later log analysis so much easier and less painful!
  • Third, we could UNLOAD from various FULL IMAGE COPY backup datasets. Remember, you can UNLOAD from IMAGE COPY with WHEN condition on a table field (not to be confused with SQL and WHERE) and search for this policy and spot check various times in the past. This might help figure when the data changed.
  • Fourth, maybe this type of question has come up in the past and the application DBA proposed and implemented that this important application table to be enhanced as a system-period-TEMPORAL table! Then it is almost easy to investigate these broad data change questions! That is our situation today!

There have been IDUG presentations and various Db2 blog articles on this concept of temporal tables. The concept exists in both Db2 Z and Db2 LUW. An example script is below! BUT, the main point of this little blog is to not go over the exact details on how to enable system-period TEMPORAL tables, this little blog is to remind you of the power of this Db2 V10 enhancement! This concept can be super helpful when digging into these data change questions!

At a high level, one adds a few new columns to the main table and creates another table that is identical to the original table. The second table is called the “history” table (called “history” table by me). Then you ALTER the main table one more time to tell Db2 that “data versioning” is now enabled and you tell Db2 the name of the “history” table! Voila! And now before every UPDATE or DELETE, DB2 will move the real/main table current row values to the history table and then change the real/main table as per the UPDATE or DELETE. Changes to both tables are effective with the commit. AND now, you can now directly query the history table to see row values over time. Or you can query the main table with "time travel" SQL syntax of "for system time as of '<some timestamp in the past>' and Db2 will figure it out and tell you the values from that time. (examples below)


How did I use my application system-period temporal table to help my support team investigate their data change question?

We ran a series of queries to the system period table (example SELECT below)

  • Basically, we confirmed the table was really updated by user ABC at a specific timestamp. This was confirmed by the “generated always” fields that were recently added!
  • And we could see the previous row value on a previous date to confirm what we thought it was on that date. This is how one travels through time with Db2 SQL.
  • And we queried the history table itself, to confirm how the policy changed over time and when.

All these results put together helped us understand that some bit of “occasional” or “ad-hoc” application was running (from a strange place) and changing more data than we thought! Excellent! Root cause discovered! The fix was another issue…. Another story

-- sql select examples of db2 history tables (temporal tables)
SET CURRENT SQLID='<tbcreator>';

-- query direct from main-parent table
SELECT *
FROM TPOLICY
WHERE POLICY_ID = 123
;

-- query direct from main-parent table as of system_time!
SELECT *
FROM TPOLICY
for system_time as of '2021-02-05 11:25:38' -- SQL time travel!!!
WHERE POLICY_ID = 123
;

-- query THE history table directly and stare at result
SELECT *
FROM TPOLICY_HIST
WHERE POLICY_ID = 123
ORDER BY SYS_END DESC
--ORDER BY LAST_UPD_TS  -- ANOTHER WAY TO ORDER THE RESULT SET
;

 


Are there any caveats or considerations when enabling this type of temporal table?

Of course! It would not be a good database concept without there being some “it depends”!

  • Was the original application table created one hundred years ago? Did it have any application audit-controlled fields like “update_timestamp” or “update_userid”? Do you trust these fields? (you should not). It is now time to consider adding some extra Db2 controlled audit columns fields to the table with “generate always”!
    • Let Db2 maintain the last update timestamp or the last change userid or the last DML type. And there are more possibilities (maybe program name or workstation)! With generate always, the application does not need to know or worry about them. Almost invisible. And you can trust them. Examples are in DDL script below.
  • What tables should be considered for system period temporal tables? Obviously, you can ignore insert only tables. Really, start with the ones with the tables you know are critical and central to the application. These are tables like “customer” or “policy” or “group” or “person” or “any-entity”. They probably do not have huge quantity of DELETE or UPDATE. You have to have some awareness of the quantity of DELETE or UPDATE on these tables. A small table with a million updates a day will mean a million inserts into the history table. Is that too much? A gazillion row table with no known DELETE or UPDATE could be considered for altering as system period temporal table! Why not? The amount of history should be low… but when it does occur then you can understand why! It really depends.
  • If you enable any type of temporal table then consider if a purge strategy is required for the history table. If the size of the history table is not large (you may define large for yourself) then keeping the history forever is not a problem. Hopefully you can keep some weeks or months or even years of history. Remember, space is cheap, and you never know when the history will be useful! If you must purge, then think about… there is often a logical purge criteria.

Some final thoughts on wording and definitions.
  • Do I need to say this? I apologize in advance, but I feel that I need to say the obvious just to be clear and to eliminate possible confusion. Temporal tables are not temporary tables. Temporal tables let you travel through time and see how your data changes over time! Temporary tables (declared or created) are temporary and are almost the opposite of temporal! The names are phonetically similar, but do not confuse the two concepts.
  • System period temporal tables are easy to implement, and applications do not know need to be aware or make application change! The Db2 system controls the system period and times. User period or user controlled temporal tables are similar, but the user or application gets to enforce the begin and end times. This can be very powerful. But the application obviously needs to think about it. This requires thought. When you read about both types of temporal tables in the blogs and presentations and IBM documentation, they often discuss both concepts equally. But really, the system-period tables are an easy first step to learning about the temporal concept. Start with system period temporal tables!

 

-- sample SQL DDL script to enable system-period controlled tables!!
SET CURRENT SQLID='<tbcreator>';

-- BEFORE ENABLING THE TABLE AS SYSTEM-CONTROLLED TEMPORAL...
-- ... ADD SOME USEFUL Db2 MAINTAINED TABLE FIELDS (useful to me!)
ALTER TABLE TPOLICY
  ADD
    LAST_UPD_TS TIMESTAMP NOT NULL
     GENERATED ALWAYS FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP
  ADD
    LAST_UPD_USERID VARCHAR(128)
     GENERATED ALWAYS AS (SESSION_USER)
 ADD
    LAST_DML_TYPE_CD CHAR(1)
     GENERATED ALWAYS AS (DATA CHANGE OPERATION)
;

-- ENABLE THE APPLICATION AS SYSTEM-CONTROLLED TEMPORAL 
-- BY ADDING SOME Db2 MAINTAINED FIELDS
ALTER TABLE TPOLICY ADD COLUMN SYS_START TIMESTAMP(12) NOT NULL
 GENERATED ALWAYS AS ROW BEGIN;
ALTER TABLE TPOLICY ADD COLUMN SYS_END TIMESTAMP(12) NOT NULL
 GENERATED ALWAYS AS ROW END;
ALTER TABLE TPOLICY ADD COLUMN TRANS_ID TIMESTAMP(12)
 GENERATED ALWAYS AS TRANSACTION START ID;
ALTER TABLE TPOLICY
 ADD PERIOD SYSTEM_TIME(SYS_START, SYS_END);

CREATE TABLESPACE ZPOLH IN <dbname>
 USING STOGROUP ... yada yada;

CREATE TABLE TPOLICY_HIST LIKE TPOLICY IN <dbname>.ZPOLH;

-- NOW THINK.. PUT *USEFUL* INDEXES ON THIS "HISTORY" TABLE
-- >> ASK YOURSELF! HOW WILL IT BE USED AND QUERIED?
-- >> (you probably do not need as many indexes as the real prod table)

CREATE INDEX X01POLH ON TPOLICY_HIST (POLICY_ID) USING.... ;

-- FINALLY... TELL DB2 THESE TWO TABLES ARE NOW RELATED!
ALTER TABLE TPOLICY
ADD VERSIONING USE HISTORY TABLE TPOLICY_HIST
ON DELETE ADD EXTRA ROW;

-- done
-- 
-- it is optional, but probably prudent at his point,
-- to REORG the main tablespace now
-- do not forget that the main and history tablespace both get appropriate
-- image copy… and the occasional or regular reorg
-- … if you use LISTDEF… 
-- then your existing DBA COPY and REORG jobs may not need change