Logical Schema Backup and Restore for Columnar Tables

March 14, 2023

Posted By: Aleksandrs Santars in Technical Content,

Part 1 – Introduction

Newly introduced in Db2 11.5.8, Logical Schema Backup and Restore utilities for columnar tables specialize in flexibility, usability, and fine grain approach to both schema/table movement and for disaster recovery of data.

Using Logical Schema Backup and Restore it is possible to:

Backup a portion of the database – single schema.
Restore schema or even a single table out of the logical backup image.
Restore onto a Db2 server with different topology (different number of nodes).
Use disk, TSM, AWS S3, or IBM COS as target media.
Online access (read and write to schema) undergoing backup.
Ability to take full, incremental (cumulative), as well as delta backups.
Restore schema to another name.

Pre-Requisites and Limitations

To take advantage of Logical Schema Backup and Restore the following one-time setup steps are required:

Database manager configuration parameter PYTHON_PATH set up to point to the location of python executable (as part of development package).
Install of ibm_db python API.
Invocation of SYSPROC.SYSINSTALLOBJECTS(‘LBAR’….) stored procedure call to create system tables required for the operation of logical schema backup.

Logical Schema Backup and Restore is available only on schemas that are enabled for ROW MODIFICATION TRACKING, and where all tables are organized by column and enabled for row modification tracking (See Migration section below).

Column-organized tables created in a schema enabled for row modification tracking will be enabled for logical backup. Tables with LOB columns over 64K in length, spatial type, XML type, index extensions, anchor data types, as well as objects referencing other schemas (e.g., index in another schema) cannot be processed by logical backup and backup will fail with an error. As a general rule, what cannot be processed by EXTERNAL TABLE will also be incompatible with logical backup.

Overview of Architecture

Tracking row modifications are required to ensure a single point for consistency both for within a single backup (only rows that existed at start of backup will be included) and for understanding what needs to be included (or deleted) in a subsequent incremental/delta backup. Three implicitly hidden columns will be added to each organized by column table when it is created in schema enabled for row modification tracking:

Column Name	Datatype	Allow NULL	Description
SYSROWID	BIGINT	No	Column that uniquely identifies each row in the table. The ID is unique across all database partitions. The values for this column are generated by a SEQUENCE.
CREATEXID	BIGINT	No	Stores the ID of the transaction that added the row to the table.
DELETEXID	BIGINT	No	Stores the ID of the transaction that deleted the row. It will be zero if the row is not deleted.

After a row is inserted and assigned a value for SYSROWID, this value will not change even if the row is updated.

Above column names are reserved by IBM for use in tables enabled for row modification tracking. Users attempting to use the same column name in the table will receive an error on create or alter table time. Only permanent columnar tables created in a schema enabled for row modification tracking will be enabled for row modification tracking. New columns in the table are visible when describing the table.

Row modification tracking columns are implicitly hidden with values internally generated. Therefore, as with all hidden columns:

will not be seen when executing a SELECT * from the table.
do not need to specify them when executing an insert, update or delete statement.
Attempt to modify the columns (insert/update/delete/alter) will either be ignored or result in an error.

Full, incremental, and delta backups

Full logical schema backup ensures all the data in the backup is from a single point in time. Using db2look DDL of all tables and objects in the schema is captured and all table data is unloaded using EXTERNAL TABLE functionality.

Full logical schema restore locks the schema, drops it, re-creates the schema, all tables and related objects, and populates tables with data using INSERT … SELECT * FROM EXTERNAL TABLE.

Incremental (cumulative) logical schema backup captures DDL changes and data changes (inserts, updates, and deletes) since the last full logical schema backup. Incremental logical schema restore first automatically restores corresponding full logical schema backup image, then applies DDL changes followed by data – insert /delete or required rows.

Delta logical schema backup captures DDL changes and data changes (inserts, updates, and deletes) since the previous logical schema backup (whether full, incremental or another delta). Delta logical schema restore automatically restores all previous corresponding backup images, including DDL and data changes.

For example, consider the following backup sequence:

Full_1, Inc_1, Full_2, Del_1, Inc_2, Del_2, Inc_3, Del_3, Del_4

Restoring Del_4 will restore the following images in sequence: Full_2, Inc_3, Del_3, Del_4.

Performance of logical schema backup is roughly proportional to the unload data size, but is difficult to estimate it up front because of many factors that can affect it, including server CPU load, IO/network performance, table compression, table access plan. It is expected that BACKUP and RESTORE DATABASE performance will be superior to logical schema backup because they generally do not process any data and simply read/store complete physical pages including indexes ‘as-is’. Logical backup has advantages over BACKUP/RESTORE DATABASE that it:

Supports restoring single table or schema.
Is topology and platform agnostic.
As table data is restored, a new dictionary is built and any fragmentation will be cleaned up, in addition RUNSTATs will automatically run and new updated statistics will be created.

Invoking LOGICAL BACKUP and LOGICAL RESTORE

Logical Schema Backup and Restore is a python script that is executed by Db2 using same credentials as the user that invoked corresponding stored procedure. This script runs on the server, but is regarded as an application by the engine, and will be reported as such, for example via LIST APPLICATIONS.

Since it is a SQL application, it is subject to all rules and tools that Db2 has. For example, it is possible to assign a WLM WORKLOAD and fine tune level of CPU consumption. Type of backup and schema is included in client information enabling even finer tuning, for example prioritizing one schema backup over another.

Schema backup is performed by the SYSPROC.LOGICAL_BACKUP stored procedure. On error partial backup image will be removed. On success timestamp is returned as part of return sqlca message. This timestamp is to be used as a label to refer to the backup image, e.g. to restore it. Logical backup ensures that timestamp is unique in case when several backups are started at the same time.

Schema restore is performed by the SYSPROC.LOGICAL_RESTORE stored procedure. The source backup image is indicated via the -timestamp option. A full schema can be restored, or a single table with the -table option. If target schema or table exist, the -drop-existing option must be specified.

To list logical schema backup images at a particular location use SYSPROC.LOGICAL_BACKUP_DETAILS stored procedure. For convenience utility maintains a history of backup and restore operations in SYSLOGICALBAR.LOGICAL_BACKUP_HISTORY and SYSLOGICALBAR.LOGICAL_RESTORE_HISTORY tables respectively.

# Create a schema and a table that are enabled for row modification tracking

$ db2 create schema rmt enable row modification tracking

DB20000I  The SQL command completed successfully.


$ db2 create table rmtrmtrmt.simple"(a int, b int) organize by column"

DB20000I  The SQL command completed successfully.


# Ensure the table is enabled, 1 way is to check for the hidden columns

$ db2 describe table rmtrmtrmt.simple

                                                                           

Column name                     Data type schema   Data type name    Column Length   Scale  Nulls

------------------------------- --------- -------------------        ----------      -----  ------

SYSROWID                        SYSIBM             BIGINT                       8        0     No

CREATEXID                       SYSIBM             BIGINT                       8        0     No

DELETEXID                       SYSIBM             BIGINT                       8        0     No

A                               SYSIBM             INTEGER                      4        0     Yes

B                               SYSIBM             INTEGER                      4        0     Yes


  5 record(s) selected.

# Insert a few rows
$ db2 "insert into rmtrmtrmt.simple values(1,1), (2,2), (3,3)"

DB20000I  The SQL command completed successfully.


$ db2 “select * from rmtrmtrmt.simple”

A           B
----------- -----------

          1           1

          2           2

          3           3


  3 record(s) selected.


# Backup the schema
$ db2 "call sysproc.logical_backup('-type full -schema rmt -path /tmp/backups')"

  Return Status = 0
SQL1796I  Logical backup utility has completed successfully, timestamp for the
backup image is "20230301171808".  SQLSTATE=01541

$ db2 drop table rmtrmtrmt.simple
DB20000I  The SQL command completed successfully.

# Oops!   Need to go get that table back…  find the last image and restore

$ db2 call "sysproc.logical_backup_details('-path /tmp/backups')"



  Result set 1
  --------------


  TIMESTAMP

  --------------------

  20230301171808


  1 record(s) selected.

  Return Status = 0

$ db2 "call sysproc.logical_restore('-type full -schema rmt -path /tmp/backups -timestamp 20230301171808 -table SIMPLE')"
  Return Status = 0

# Check to make sure we have table/contents again…

$ db2 “select * from rmtrmtrmt.simple”


A           B
----------- -----------

          1           1

          2           2

          3           3


  3 record(s) selected.

Migration of existing table data

Existing schemas may be altered to enable row modification tracking. However, this does not affect any tables in the schema – only newly created tables will be enabled for row modification tracking after schema is altered. Existing tables, even if empty, must be re-created.

Recreating tables can be done through any means, for example using ADMIN_MOVE_TABLE, or CREATE TABLE … AS … WITH DATA (select * from one table into a new table in the same schema). The target table will be created as enabled for row modification tracking if the target schema is enabled for row modification tracking and target table type supports row modification tracking. Whether the source table is enabled for row modification tracking or not has no impact on what will be chosen for the target table.

To assist with migration, SYSPROC.LOGICAL_BACKUP can take logical backup image of a schema that is not enabled for row modification tracking by specifying -backup-migrate option. There are some key things to note when scheduling this procedure:

This is a full backup only and cannot be used as a base for follow up incremental/delta backups.

It locks all tables in schema at the start of the backup to ensure a single point of consistency. As such only read access is allowed to any table in the schema for the whole duration of the backup.

In addition, to ensure no data loss between full backup and restore, once backup is done it should be restored before any update activity is done.

If schema has any tables that cannot be created as ROW MODIFICATION TRACKING (e.g. ORGANIZE BY ROW), backup will fail.

Resulting backup image can be restored using -enable-row-mod-tracking option. As before, restore will drop the schema and recreate it as ENABLE ROW MODIFICATION TRACKING. All tables in the schema are attempted to be enabled as well. Any table DDL/condition in original schema that will prevent table from being enabled for row modification tracking will cause the backup to fail.

# Create table in schema not enabled for row modification tracking and populate it with data.
$ db2 create schema non_rmt

DB20000I  The SQL command completed successfully.


$ db2 "create table non_rmt.old(a int, b int)"
DB20000I  The SQL command completed successfully.


$ db2 describe table non_rmt.old

                                Data type                     Column

Column name                     schema    Data type name      Length     Scale Nulls

------------------------------- --------- ------------------- ---------- ----- ------

A                               SYSIBM    INTEGER                      4     0 Yes

B                               SYSIBM    INTEGER                      4     0 Yes


  2 record(s) selected.


$ db2 insert into non_rmt.old values "(1,2), (2,3)"
DB20000I  The SQL command completed successfully.


$ db2 select "* from non_rmt.old"

A           B
----------- -----------

          1           2

          2           3


  2 record(s) selected.


# Run migration using logical_backup and logical_restore.
$ db2 "call sysproc.logical_backup('-type full -schema non_rmt -backup-migrate -path /tmp/backups')"


  Return Status = 0

SQL1796I  Logical backup utility has completed successfully, timestamp for the
backup image is "20230301194818".  SQLSTATE=01541



db2 "call sysproc.logical_restore('-type full -schema non_rmt -target-schema new_rmt -timestamp 20230301194818 -path /tmp/backups -enable-row-mod-tracking')"

  Return Status = 0

# Confirm restored schema and table are ready for logical backup.
$ db2 "SELECT CAST(SCHEMANAME as CHAR(10)) as SCHEMANAME, ROWMODIFICATIONTRACKING, QUIESCED FROM SYSCAT.SCHEMATA WHERE SCHEMANAME='RMT'"


SCHEMANAME ROWMODIFICATIONTRACKING QUIESCED
---------- ----------------------- --------

NON_RMT    Y                       N


$ db2 describe table new_rmt.old

                                Data type                     Column
Column name                     schema    Data type name      Length     Scale Nulls

------------------------------- --------- ------------------- ---------- ----- ------

SYSROWID                        SYSIBM    BIGINT                       8     0 No

CREATEXID                       SYSIBM    BIGINT                       8     0 No

DELETEXID                       SYSIBM    BIGINT                       8     0 No

A                               SYSIBM    INTEGER                      4     0 Yes

B                               SYSIBM    INTEGER                      4     0 Yes


  5 record(s) selected.


$ db2 select "* from new_rmt.old"

A           B

----------- -----------

          1           2

          2           3


  2 record(s) selected.

Conclusion

Logical Backup and Restore is a lightweight and versatile tool to save and restore schemas independently, without much impact to the rest of the database.

It can be used by application developers to create a restore point and use it in case a table is corrupted or accidentally dropped. Alternatively, DBAs can copy a schema from production to test environment with different (smaller) topology, as well as move data from one database to another using S3 cloud as intermediate storage.

Part 2 will contain a deep dive into more options and specifics of LOGICAL_BACKUP and LOGICAL_RESTORE, in addition it will describe how to troubleshoot the new utility.

Aleks Santars is alumnus of University of Waterloo, and has joined IBM Db2 LUW almost 20 years ago as a coop student.

To this day he is fascinated with Db2 and Db2’s impact on the world, never ending stream of challenges, and amazing team of people that are fun to work with and to learn from.

Currently Aleks manages a team in Db2 engine department, and oversees a few projects – including Logical Backup and Restore.

5 Misconceptions About Cloud Data WarehousesMarch 6, 2023 Columnar Data Engine (CDE) Storage New FeatureMarch 21, 2023