Back to Basics: Algorithms for Recovery and Isolation Exploiting Semantics

April 5, 2022

Posted By: Emil Kotrc in Technical Content,

Most likely, you understand how logging in Db2 for z/OS works in general. There are log buffers, active logs, which get offloaded to archive logs, and all is being tracked in the bootstrap dataset. For more details about the basics refer to Martin Hubel's IDUG article. All conceptually pretty clear.

However, have you ever heard about ARIES (Algorithms for Recovery and Isolation Exploiting Semantics)? The link before points to wikipedia where you can find more information, most importantly you can find there a link to a paper written by IBM Fellow, Dr. C. Mohan. Dr. Mohan's paper on ARIES is one of the most influential work in the database management area.

ARIES is a set of algorithms that define the basics of concurrency control and recovery management. There are couple of techniques to implement transaction processing and ARIES is the one used in Db2. Even in ARIES there are many nuances and we will not go too much deeply into them. In this article, I just wanted to remind basics of ARIES principles and show the respective pieces in Db2. Hopefully, this will be for you a useful trip into the very basics of Db2 foundations.

The wiki summarizes three main principles of ARIES as

write ahead logging (WAL)
repeating history during Redo
Logging changes during Undo

Let's discuss the principles in details in Db2 for z/OS perspective:

ARIES building blocks

Write ahead logging

Write ahead logging as one of the main principles of ARIES means that any change done to a database object such as inserting, updating, or deleting a row is recorded in a log before it gets offloaded to disk to a physical tablespace. Hence, this technique is called write ahead logging. If you commit your changes, Db2 guarantees the data is saved into a log dataset. The changes do not necessarily need to be mirrored in the tablespace yet. The log in this sense in Db2's perspective is the log buffers (see the OUTBUF ZPARM) and active/archive logs.

Db2 writes log records into the log. IBM provides a mapping macro for those records and you can find the macro in SDSNMACS(DSNDQJ00). The macro is a bit difficult to read, but you don't need to understand it. That is why there are log analysis tools that help to analyze the data and display the relevant information. If you have nothing else, you still have DSN1LOGP utility, which can at least format the log records.

In fact, all is a bit more difficult, because log records are saved in VSAM Control Intervals and log records may even span multiple CIs. You really don't want to do this business all on your own. But if you still want, IDCAMS can be your friend, I've warned you.

Anyway, back to log records. Every log record in Db2 has a log record header. Header possesses important data, such as record type (we will touch some), Db2 version that created this record, some other control information, but most importantly it includes log record sequential number (LSN). LSN which is another building block of ARIES. In Db2 terms, this can be the relative byte address (RBA) or log record sequence number (LRSN). RBA is based on the physical offset in the log if you think about the log as a huge ever growing file, while LRSN is based on a store clock timestamp and is used in a data sharing environment as each member has its own log. RBA's and LRSN's common characteristics are that they are ascending and unique 10 byte numbers.

The data changes that your applications does are sooner or later propagated into the data and index pages of the corresponding table/index spaces. The technique that Db2 uses to update the pages, is called in-place updates. Db2 updates the existing pages, while an alternate technique used in some other RDBMs, called shadowing, would write the updates to the new pages.

Repeating history during redo

I've mentioned that the log record header contains a type of the log record. There are couple of types and subtypes of Db2 log records. You may refer to the DSNDQJ00 for the full list, for sake of simplicity of this article I will just discuss the following families of types:

unit of recovery logs
checkpoints

What does it mean repeating history during redo? If there is a need of a recovery, Db2 repeats the history of what has been done. In such a case Db2 replays the Db2 log and this is called redo. Imagine, rewinding your tape, and playing the sound again. Wait a moment, you no longer use tapes?

Anyway, as we've learned in the prior section, Db2 does not flush the changes directly to the underlying tablespace VSAM datasets, but guarantees that the data are written to the Db2 log first. Hence, Db2 cannot rely on the data in the physical tablespaces, but can rely on the Db2 log in case of any issues or rollbacks.

For most of the cases applications do inserts, deletes, or updates. This is what gets recorded to the log:

in case of an insert, Db2 records the full new row. This is called redo data. If Db2 replays the log, does the redo, it performs the insert again if needed.
in case of a delete, Db2 marks the row as removed, but it also keeps the original row before the deletion in the log record. Why? Besides the redo information, I've described above, Db2 also keeps the undo information. The undo part of log record is used in case of transaction abort or rollback. In case of rollback, Db2 needs to revert the changes and insert the deleted row back. More on that in the next section.
for updates, Db2 records both parts - redo and undo data. Redo portion of the log record contains the new piece, while the undo portion contains the prior values. If you take a look into the DSNDQJ00 you will see couple of interesting points related to update log records. There are many different flavors of updates (I recall 18 variants), but what is important is that there are two families of update log records - full logs or non-full logs. Full logs are recorded if the associated table is created with CAPTURE DATA CHANGES. In this case Db2 records the full row image. In case of non-full logs, CAPTURE DATA NONE, Db2 records only a partial row image, which can be just a byte within a row that has changed or some other bigger part of the row. A side note about log analysis tools. If your log records do not contain the full image, but you still want your log analysis to show you the entire row, what options does it have? It can grab the full row from the tablespace datasets, from an image copy, or even from another log record, such as from an insert. However, in all these cases, you need to understand that the log analysis tools need to do some extra work to complete the rows. Unless, you have some very good reasons to not use DATA CAPTURE CHANGES, use DATA CAPTURE CHANGES. Also, please note that DATA CAPTURE NONE is the default setting.

Logging changes during Undo

We have learned that Db2 records the data changes in the Db2 log and that it saves the changes in the redo or undo parts of the log record. What happens, if there is a rollback or Db2 needs to undone some work, such as abort inflight transactions? Db2 uses the undo part of the log records to rollback the changes back, but it also does an additional thing, which is another important building block of the ARIES. Db2 records the rollback changes using another set of log records. I will say it differently, if Db2 applies the undo part of the log records for any reason, it also writes extra log records to track this operation. These types of log records are called compensation log records as they compensate the work done before. Why are compensation log records that important? They basically allow Db2 to apply only redo parts of the log records during recovery and Db2 does not need to care about the redo/undo. Imagine you do an insert and then rollback. Db2 writes an insert log record, then it hits the rollback so it needs to reverse the operation. In this case you can think about it as an delete, which gets also recorded in the log as a compensation log record. Now, if there is a crash or a later recovery, Db2 will just replay the log and apply the log records. In our simple scenario, Db applies the insert first, then applies the delete of the same row, and we are back to the original state as if the row never existed. No need to care about what happened, just reading and applying the redo part of the Db2 log records.

Checkpoints

I guess you don't trust me that all is that simple and you are right. There is way more that what I just explained. There are couple of details missing. Db2 records the redo/undo data into the log, but it also needs to keep track of the transactions themselves in order to correlate the log records. If you start a transaction, there is a special log record - Unit of Recovery start, that opens the transaction. Then all log records that are written within the same unit of work, are identified by the Unit of Recovery (UR) ID in their header. Unit of recovery ID is simply the RBA of the UR begin log record. Moreover, all log records within the UR are chained together so that Db2 can easily find them when doing rollback for example. Unit of recovery ends by an UR end record, which can simply mean commit or rollback. In case of rollback, Db2 goes back with the log records and removes the changes using the undo data. As written above Db2 also writes the compensation log records that record the rollback activity. Such activity ends with an end abort log record.

Transactions may come and go and Db2 needs to keep track of them. I just mentioned how DB2 keeps track of the transactions using the Begin/End Unit of recovery log records. However, that is not the only information Db2 has. Db2 also records in the log the current state of transactions in regular intervals called checkpoints. Checkpoints are another building block of ARIES and consist of several types of transactions and I will just mention two:

unit of recovery summary
page set summary.

Unit of recovery summary checkpoint records basically dump the Db2 control block that maps the current state of each transaction that is running. The page set summary records record the log record sequence numbers that were not externalized to disk. This helps to identify so called dirty pages. How does Db2 check whether a record was applied to a page? Every page contains the log sequence number of the last record that was applied. If this page log sequence number is lower that the log record's sequence number, then the log records hs not been applied, isn't that elegant?

The checkpoint itself consist of begin checkpoint and end checkpoint records, and many checkpoint records within. Completed checkpoints are recorded in BSDS. Please note that while check-pointing, Db2 still continues processing, accepting new transactions, etc. So, if you take a look into the log in the time of checkpoint, you may still see other log records being written as well.

Why are checkpoints important? In case of a crash or restart, Db2 starts with the most recent checkpoint and rebuilds the state - all transactions that were running and the state of the data on disk. With all this information, Db2 can then start applying the logs and in case of aborting the transactions, undoing the changes, and writing the compensation log records.

Putting it all together

Now we can finally put it all together. If Db2 needs a recovery (not just a system, but even an object based) it first needs to rebuild the last state. Db2 will use checkpoint records to reestablish the transactions states and to have understanding of the dirty pages. Then it needs to repeat the history using redo data, which may include the redo of compensation log records. Then the final stage comes to play, which is rollback of the transactions in abort, which undoes any uncommitted work and writes compensation log records for any future recovery purposes.

Hopefully, this all gives you a basic understanding of the concepts of ARIES, especially terms such as:

write ahead logging
redo and undo data
checkpoint data.

Author: Emil Kotrc, Broadcom
Created: 2022-04-04 Mon 15:57

Back to Basics: Db2 z/OS Access Paths and Db2 ExplainMarch 29, 2022 Db2 for z/OS Dual Logging: zHyperWrite and zHyperLink—are they related?April 13, 2022