Basic Tablespace Recovery
(by Tony Andrews)
Db2 provides means for recovering data to its current state, as might be required after a system failure, or to a specific earlier state. The units of data that can be recovered are tablespaces, datasets (partitions), single pages, or RBA ranges in the log. In a distributed network consisting of multiple Db2 subsystems, if you recover one subsystem to a certain point in time, you may also need to recover to the same point in time some or all of the other subsystems defined in the network. In a referential set of tables, you may also need to recover the set of tables together.
To ensure that a tablespace can be recovered to a particular point, there must be a copy of its data at some earlier state. That is called a backup image copy and gets created with the COPY utility. The Db2 recovery log contains a record of all changes made to the tablespace. If Db2 fails, it can recover the tablespace by restoring the backup copy and applying the log changes to it.
Development at your site of backup and recovery procedures is critical to avoid costly and time-consuming losses of data.
In general, the more often you make image copies, the less time required for recovery because the image copy is closer to the recovery point and less log data needs to be read and applied. But the cost is that more time is spent generating the image copies which may (or may not) ever be used.
If you use LOG(NO) when running the LOAD or REORG utilities, then the tablespace is placed in a copy pending status; you must remove it, generally by making an image copy, before making any further changes to the data.
If you use LOG(YES) in the tablespace definition, and log all updates, then an image copy is not required for data recoverability. However, taking an image copy makes the recovery process more efficient.
By running the COPY utility, image copies can be a full copy of the data in the tablespace or can be an incremental copy. The incremental copy backs up only the pages within the physical tablespace dataset that have changes since the last copy. Incremental copies can be done only when the tablespace is defined with the TRACKMOD YES option.
Monday morning: Full image copy of TSPACEX
Tuesday morning: Incremental copy of TSPACEX
Wednesday morning: Incremental copy of TSPACEX
Thursday morning: Incremental copy of TSPACEX
Friday morning: Incremental copy of TSPACEX
Friday afternoon: Unsuccessful write operations to TSPACEX occurring (for example a disk hardware corruption or the accidental deleting of a tablespace file). Another reason to recover is due to an ‘application disaster’ where an application program(s) updates data incorrectly and recovery is needed to a previous point in time.
NOTE: Each time the COPY utility is executed, inserts are automatically done to the following:
SYSIBM.SYSCOPY, a catalog table, contains history information about utilities that impact recoverability that are run against a tablespaces.
SYSIBM.SYSLGRNX, a directory table, contains information about recovery log ranges providing a fast and efficient way to access appropriate log records for recovery of a tablespace. Ranges are also written when a tablespace is closed and other events. This minimizes having to read through (scan) many records in the log that do not pertain to the tablespace during a recovery process.
Other utilities and actions also place rows in the SYSCOPY table to track the recoverability of Db2 objects. The columns ICTYPE for a row can be any of the following:
B REBUILD INDEX
D CHECK DATA LOG(NO)
E RECOVER (to current point)
F COPY FULL YES
I COPY FULL NO
M MODIFY RECOVERY utility
P RECOVER TOCOPY or RECOVER TORBA
R LOAD REPLACE LOG(YES)
S LOAD REPLACE LOG(NO)
T TERM UTILITY command
V REPAIR VERSIONS utility
W REORG LOG(NO)
X REORG LOG(YES)
Y LOAD LOG(NO)
Z LOAD LOG(YES)
Along with each entry is the date and time, the utility or action, and an RBA log entry. RBA stands for Relative Byte Address and is an address offset from the beginning of the log file. This is essential is recovery when it comes to recovering to specific point in time (specific RBA) or to recover to current from the last image copies.
Here is a query against the SYSCOPY table to show an example of its. This is specific to a tablespace TSPROJ in database DB3020PR.
As you can see from the output, there have been a number of actions on this tablespace:
C = Tablespace created
Y = Tablespace loaded (from a LOAD utility) with LOG NO. This would have put the tablespace in a
‘Copy Pending’ status meaning the tablespace is unavailable until a COPY utility is run against
F = Full image copy.
Q = Quiesce point set.
From this simple example, it becomes really easy to recover this tablespace using the RECOVER utility. If you want to recover to current time, you run the following. Note that in recovering a tablespace you will need to rebuild all the indexes unless those have been backed up at the same time as the tablespace. If that is the case then the indexes would also have to be recovered.
If you wanted to recover to a very specific point in time only, you take the RBA that is in the SYSCOPY table. This job will recover to the above quiesce point in time.
What happens during the Recovery?
- The last full image copy is copied to the tablespace.
- Any later incremental image copies; each summarizes all the changes made to the tablespace
from the time the previous image copy was made.
- All log records created since the most recent image copy.
In summary, DBAs need to:
- Determine how often to tack image copies
- Consider full image copies vs incremental
- Consider retention period of the image copy dataset
- Consider how quickly would they need to recover each tablespace. The more current the image copies the less time to recover.
- Consider how much of the active log records to keep.
- Consider how long to keep the archive log datasets. Many production shops try to keep 12 to 24 (or more) hours of active log datasets. Archive logs might be kept for reasons other than recovery. You might keep them for long term data change investigation and log analysis. If someone asks why or who changed this data/row about N days ago then it is handy to have the archive log from N days ago!
- Consider SHRLEVEL CHANGE vs SHRLEVEL REFERENCE when running the COPY COPY with SHRLEVLE CHANGE is relatively friendly with parallel application processes and has minimal impact to them. COPY with SHRLEVEL REFERENCE provides image copy within the LIST that are consistent and useful for potential later UNLOAD from image copy.
- Practice RECOVERY before you need to do it! Be comfortable with this important utility before you have a real disaster and a panic to use it. At the very least, practice recovering some non-prod database objects. But this is not always similar enough to production databases in size and complexity. Ideally, find a way to practice recovery of production database objects!