Temporal and Archive - How to Manage Data Over Time

This month we will be focusing on temporal table design and Archive tables. DB2 11 for z/OS brought many interesting improvements for temporal tables, which will be addressed during the upcoming month. On top of that in gave us archive tables, which we can use for “transparent archiving”. As a first article this month, archive table will be covered.

We are all aware that data can grow either for legal reasons or lack of a cleanup process (we’ll fix that at the end, which then never happens as the development budget is fully spent). This doesn’t mean that all of that data is constantly being used, or is active. This inactive data can create a problem, when it is kept among the active data. As our active tables grow, access might get slower. Setting up archive tables, before DB2 11 for z/OS, could be a lot of work, and maintaining them could be a pain. So however useful, plenty of DB2 shops only used to set up archiving if it was the only way out.

IBM introduced “archive tables” in DB2 11 for z/OS. The setup is somewhat similar to “system temporal tables” that were introduced in DB2 10. If you are familiar with system temporal tables, you will recognize similar ways of working. Like with system temporal tables, transparent archiving allows you to define an archive table and associate that with an existing operational table. This operational table is then referred to as the archive enabled table.  Since DB2 is now aware of the connection between the operational table and the archive table, it can take care of the transport of data from the operational table to the archive. It will do this for deletes in the operational table based on the setting of a system global variable, SYSIBMADM.MOVE_TO_ARCHIVE.

The link is established by issuing an alter statement on the operational table.

 

 IDUG 1.jpg

 

Once this connection is established, a delete on the operational data can cause DB2 to move that deleted data to the archive table.  Within your application issuing the delete, you can set SYSIBMADM.MOVE_TO_ARCHIVE = ‘Y’ at which moment the data will be moved. Be careful, the default setting of SYSIBMADM.MOVE_TO_ARCHIVE is “N” meaning that a delete will NOT insert rows into the archive table.  The SYSIBMADM.MOVE_TO_ARCHIVE setting must be changed to Y for archiving to occur.

Sometimes, however you want to delete data because it is wrong, not because you want to archive it. In this case, set SYSIBMADM.MOVE_TO_ARCHIVE to its default value of “N”. This prevents temporarily incorrect data from being saved as archived data.

Be advised that when SYSIBMADM.MOVE_TO_ARCHIVE is set to Y, it is assumed that only archiving is taking place. This value prevents inserts and updates from occurring to the operational table during this time.  If you want to allow inserts and updates during the archiving of deletes,  you can set SYSIBMADM.MOVE_TO_ARCHIVE to “E” to allow this behavior..

Retrieving data from an archived enabled table is made easy.  The default behavior is when running a query on the operational table, only data from the operational table is returned. This is likely normal behavior.  When an application can determine it needs operational data and also archived data, in that situation it can set new system global variable SYSIBMADM.GET_ARCHIVE = 'Y'.  The great part about this, you only need to write one query and the setting of the global variable in your application will determine which data is returned.  In fact when this variable is to “YES”, DB2 will transform your query against the operational table into a union all between the operational and the archive table.

 

 IDUG 2.jpg

 

This makes the use of archive table very flexible, very easy and very powerful. 

How do you manage data archiving?

Archiving can be done on a periodic or a continual basis depending on the needs of the application. Let’s discuss the considerations for archival and deleting data.

If archiving is a periodic process, an offline delete process is likely a good choice. In this case, use an archival job during off peak hours using SYSIBMADM.MOVE_TO_ARCHIVE = Y. no inserts or updates are allowed at this time, but application impact is minimal and the outage is planned.

Continuous archival is similar to an audit trail, but it only records deleted rows. In addition, it may not be desirable to archive rows that are deleted as a result of corrections to business transactions. As such, the application might have to regularly change the setting of the SYSIBMADM.MOVE_TO_ARCHIVE global variable to properly reflect desired business behavior.

Continual archiving draws us into a comparison to system-period temporal tables. System-period temporal tables are similarly defined using ALTER TABLE ADD VERSIONING. The main difference is system-period tables automatically record all update and delete activity. Also, system-period temporal tables can contain additional columns to enhance the audit trail.

By comparison, using SYSIBMADM.MOVE_TO_ARCHIVE = E allows for only deletes to be recorded. Under application control, the value of the global variable could be changed to affect the archiving behavior. So, while behavior for deletes could be quite similar, it can also be quite different. A table cannot be both an archive-enabled table and a system-period table.

 

Deleting of Archive Table Rows

At some point in the future, it may practical to delete archived table rows. For example, seven years might be a legal requirement, or some other industry or storage requirement may exist. At this point, the data can be removed from the archive table. This must be controlled carefully.

On the other hand, some companies who had the legal right to delete archives have sometimes found if they didn’t delete old data that this can be used against them in court if it still exists. If data exists and it is requested by a court, it must be produced even if it is older than what needed to be retained.

 Here is a list of further things to consider:

  1. The same query in an application can have at runtime a completely different behavior. Depending on input variables the value of GET_ARCHIVE may be set to “Y” or “N” and although the query may look the same, from a performance point of view the behavior is completely different. This is something you should be aware of. 

    Imagine a 2 table join (example Op1 and Op2) and both tables are archived enabled (Arch1 and Arch2), depending on the setting of one variable you will have a simple join between two tables (Op1 joined to Op2) or a join being distributed over two unions (this will lead to the following join: Op1 to Op2, Op1 to Arch2, Arch1 to Op2 and Arch1 to Arch2). When setting up archive tables you should consider providing indexes to facilitate this joins.

  2. Your package containing the select on the operational table must have been bound with ARCHIVESENSITIVE(YES) in order for DB2 to take the value of SYSIBMADM.GET_ARCHIVE into consideration. For SQL scalar function or SQL native stored procedure, the option at create time must be similarly: ARCHIVE SENSITIVE YES.

  3. When making changes to the DDL structure of an archived enabled operational table only alter add column is automatically promoted to the archive table. Any other change will require the DBA to break the archive connection (Alter table operational table disable archive), in order to apply the DDL changes on both the operational table and the archive table and reestablish the archive link.

 

Kurt Struyf and Martin Hubel

IBM Gold Consultants

Recent Stories
Selecting Appropriate Data Architecture Technologies

SQL Tuning for Developers – Part 2 Explain Output

Some of my favorite tunings tips and tricks for DB2 by Joe Geller