How to deadlock yourself in three simple steps

Some time ago I had a discussion with a DBA who guided the developers to use a singleton SELECT instead of fetching from a cursor in cases where just a single row would be selected. Singleton SELECT is a select statement that returns a single row at most. Even though I did not know any further details about their environment, it seemed obvious to me that this would always be the best solution for readability and performance, because by using a singleton SELECT you avoid all the overhead of OPEN, FETCH, CLOSE cursor. However, that discussion lit a fire in me and I still kept thinking about possible scenarios where a singleton SELECT could not necessarily be the best choice.

Later, I came across a great Redbook DB2 9 for z/OS: Resource Serialization and Concurrency Control, where I found a simple use case where a singleton SELECT statement could cause interesting issues and surprising results. By the way, even though the Redbook is for DB2 9, most of the information in there is still relevant and I encourage you to read it if you are about to study serialization and concurrency control in more details. (Moreover, you can even find a prior version of that Redbook, which was written for DB2 4 and which I found interesting as well.)

Anyway, what I would like to discuss in this article is a simple scenario, that shows that a singleton SELECT can result in surprising application behavior if not used properly. As I got inspired by that Redbook, I will use a very similar use case here, so the credit goes to the authors of the Redbook.

A brief recapitulation of locks and isolation levels

Throughout the article we will use different locks that DB2 acquires for SQL statements and different isolation levels you can use for concurrency control. Let me do a brief recapitulation of the basic terms so that the content of the article is more understandable.

Basically, DB2 uses locks to ensure that no process accesses data that has been changed, but not yet committed, by another process, and to ensure that an application cannot access dirty data (partial or incomplete data).

The lock size is the amount of data that is controlled by that lock. DB2 allows these sizes - Table space, table, partition, page, row, LOB, and XML. We will only deal with row and page locks in this article.

The important property of each lock is its mode, which DB2 uses to determine if one lock is compatible with another. The locks are said to be compatible if the mode of a lock for process A permits the lock acquired by the process B. If the locks are not compatible, B cannot proceed and must wait until A releases its lock.

There are three row and page lock modes we will use:

  • S-lock (share) - the lock owner can read but not change the page or row
  • U-lock (update) - the lock owner can read but not change the page or row. The important difference between U and S locks are their compatibility, which we will see in the compatibility table below. U-locks can reduce the deadlock situations.
  • X-lock (exclusive) - the lock owner can read or change the page or row. Concurrent processes cannot acquire S, U, or X locks on the same resource (page or row) while the owner stills holds the X-lock. Please note that there are few exceptions to this, for example the isolation level UR (see below) does not acquire locks and as such permits users to read even uncommitted data. However, these situations are out of the scope of this article.

The compatibility of page or row lock modes is summarized in the following table. It shows which lock modes are compatible.

Lock mode  S-lock  U-lock  X-lock
S-lock yes yes no
U-lock yes no no
X-lock no no no

 

What will be most important in this blog post is the following property - if process A holds S-lock, then process B can acquire S-lock as well - they are compatible. However, if the process A holds U-lock, the process B cannot acquire U-lock on the same resources - they are not compatible.

Do not however confuse the matrix above with the lock promotion. Lock promotion means that an owner of a lock can exchange that lock for a more restrictive one on the same resource. For example, an owner of an U-lock can promote the lock to X-lock once he updates the data.

The other property of a lock is its duration, which is the length of time the lock is held. There are several factors that influence the duration of the lock, for example you can control it with the isolation level used. Please note that the page and row locks are rarely held beyond the commit point.

The isolation level an application uses will also greatly impact the level of concurrency across that application, as well as other applications with which it shares data. DB2 uses one of four isolation levels for a connection. The isolation level can be specified at the application level via a bind parameter or connection attribute, and at the statement parameter via an isolation clause or a prepare attribute. DB2 supports these isolation levels:

  • Repeatable Read (RR) in which DB2 ensures an application with isolation level RR doesn’t read a row that another process has changed until that other process has released the row, and that other processes don’t change a row an application with isolation level RR has read until that process commits or terminates. RR is the most restrictive isolation level and ensures the highest level of stability along with the lowest level of concurrency.
  • Read Stability (RS) in which DB2 ensures that an application with isolation level RS doesn’t read a row that another process has changed until that other process has released the row, and that other processes don’t change a row an application with isolation level RS has read until that process commits or terminates (except for newly inserted rows or updated rows that don’t satisfy the application’s search condition). RS, like RR, provides a very high level of stability along with a very low level of concurrency.
  • Cursor Stability (CS) in which DB2, like RR and RS, ensures that an application with isolation level CS doesn’t read a row that another process has changed until that other process has released the row. Unlike RR, however, CS allows other applications to change rows that the application with isolation level CS has read. This provides for a high level of stability with a high level of concurrency and is the default isolation level.
  • Uncommitted Read (UR) in which DB2 ensures nothing. That is, an application reading with isolation level UR can read data that other applications have changed and haven’t yet released. Likewise, other applications can update the data that an application reading with isolation level UR has read. This provides for the highest level of concurrency, but provides for no data stability, as a UR reader can read “dirty” data that may eventually be rolled back. This is obviously not suitable for transactional applications, but can have perfect uses in the Business Intelligence or Analytics type applications.

The vast majority of applications execute with an isolation level of CS and, for the most part, this is satisfactory. However, if a higher level of performance and concurrency are desired, this isolation level may not be good enough. In these situations, an isolation level of UR may be desired or concurrent access resolution may be considered. In this article we will use mainly two isolation levels - CS and RS.

As you would expect there is much more details about locks and isolation level than written above, but hopefully this will be enough for us to understand the content below. For more details see the references at the end of the article.

Use case

Now let's go back and consider the following simple use case, which by the way I think is pretty common. We keep orders in a table ORDERS. Each order has a unique order number, which is generated by the application and the next available value is kept in a separate table COUNTER.

Assume the following simplified DDL for those two tables:

CREATE TABLE COUNTER
(NEXTORDER INTEGER
);

CREATE TABLE ORDERS
(ORDERNO INTEGER,
-- some other columns of the order
);

NEXTORDER column in the COUNTER table will hold the future order number and there will be just a single row in this table. In our case let the NEXTORDER be in a sequence order, i.e. each update will increase it just by one. The initial value will be 1.

You might object that in such situation the counter table could eventually be replaced by SQL sequences or an identity column. That is true, on the other hand, the NEXTORDER number could also have some special properties that need to be fulfilled by the application. At this moment, let us keep this over-simplified scenario.

Releasing locks too early - Corrupted database

Now let's assume the following pseudo-code of an application that places an order:

-- select the available order number
SELECT NEXTORDER INTO :current_order FROM COUNTER;

-- update the next available order number
UPDATE COUNTER SET NEXTORDER = NEXTORDER + 1;

-- insert the new order
INSERT INTO ORDERS VALUES (:current_order, ...);
COMMIT;

Looks pretty easy, right? We use the singleton SELECT to get the next available order number. Now, what can happen if the code is bound with the default settings - isolation level CURSOR STABILITY (CS) and with CURRENTDATA(NO)? Let's say there are two users that are placing different orders concurrently - user A and user B. The sequence of operations could be like this:

  1. User A - SELECT NEXTORDER FROM COUNTER
    • let's say current_order is 123
    • thanks to CS isolation level, no lock will be held on the row from the COUNTER after the singleton SELECT finishes
  2. User B - SELECT NEXTORDER FROM COUNTER
    • current_order will be the same as in previous case, 123
    • again, no lock will be held on the row from COUNTER after the SELECT finishes
  3. User A - places the order and updates the counter
    • UPDATE COUNTER, new value will be 124
    • INSERT INTO ORDERS, with current_order = 123
    • COMMIT;
  4. User B - places the order and updates the counter
    • UPDATE COUNTER, new value will be 125
    • INSERT INTO ORDERS, with current_order = 123
    • Now, what just happens here. Both users A and B are using the same order number - 123, but for different orders! If there is a unique index on table ORDERS, the insert in B will fail because of the duplicate order. If there is no unique index, a new row with the same order number will be inserted. Is this really what we wanted?

This example shows that a concurrent activity with the previous setup may cause unsuccessful inserts or, which is worse, can break the consistency of data. The singleton SELECT does not look like a correct solution here because the lock on the row from the COUNTER table is released too early.

Also notice that in our case there is only a single row in the COUNTER table, however if we had multiple rows (for example for each department) we would need to code a proper WHERE clause for the singleton SELECT, otherwise we would get an SQL error. If you need to ensure that only one row is returned (i.e. a singleton SELECT), code FETCH FIRST ROW ONLY clause and consider using ORDER BY as well to control which row is returned.

Compatible locks - Deadlock

OK, the problem with the previous scenario was that we were releasing the lock too early. A singleton SELECT under CS isolation level does not hold any lock after the query. What could we do about that? Maybe the idea could be to make the lock duration a bit longer - more restrictive isolation level could help in this case. If we keep the code and switch the isolation level from CS to Read Stability (RS) this is what will happen - RS guarantees that any concurrent updates or deletes to the qualified rows are prevented. So, it seems it could help us right? The first one user that runs the SELECT gets the S-lock, which will be kept until COMMIT. Well, it appears that it will not work that nicely in case of parallel operations. Again, let's assume we have concurrent users A and B, placing the orders in almost the same time.

  1. User A - SELECT NEXTORDER FROM COUNTER
    • let current_order = 123
    • There is an S-lock on the row from the COUNTER table
  2. User B - SELECT NEXTORDER FROM COUNTER
    • There is already an S-lock from the user A, but S-locks are compatible so B will get S-lock as well on the row from COUNTER.
    • current_order is again 123
  3. User A - place the order and update the counter
    • User A tries to update the COUNTER and therefore acquiring X-lock, but in this case the UPDATE needs to wait until S-lock from B is released, because S and X locks are not compatible.
  4. User B - place the order and update the counter
    • we are in the same situation as user A. The UPDATE in B is currently waiting for A to release the locks, but nor A and B cannot proceed further until IRLM detects this deadlock situation and rollbacks one of them. After this interval, the other program can the continue, but the another one gets -911.

What happened is that we created a deadlock scenario in three simple steps:

  1. We used a singleton SELECT in order to obtain the next value of the counter
  2. We used Read Stability isolation level that made the lock duration of the S-lock longer
  3. We executed the activities concurrently.

If both users already acquired S-locks they will get into the deadlock situation because none of them could acquire X-lock for UPDATE. Remind - S-locks and X-locks are not compatible.

Possible (correct) solutions

So, we have seen two possible problems with our simple scenario - either corrupting a database, or causing deadlocks. It seems that isolation levels will not help us here. Maybe, the problem is really with the singleton SELECT? If so, what could be good candidates how to correctly deal with this situation?

Using an UPDATE cursor and a positioned update

This was actually my initial idea and it relates to my initial question - if there could be a situation where a fetch from a cursor may be a better solution than using a singleton SELECT. It shows that a cursor even for fetching one row may have sense. Though there is a slight performance penalization caused by the cursor management. How the pseudo code would look like in this case?

-- declare cursor for update
DECLARE CURSOR C1 FOR
SELECT NEXTORDER FROM COUNTER
FOR UPDATE OF NEXTORDER;

-- open cursor
OPEN CURSOR C1;

-- fetch the available order number
FETCH FROM C1 INTO :current_order;

-- do a positional update
UPDATE COUNTER SET NEXTORDER = NEXTORDER + 1 WHERE CURRENT OF C1;

-- close the cursor
CLOSE CURSOR C1;

-- insert the new order
INSERT INTO ORDERS VALUES (:current_order, ...);
COMMIT;

With this code, we can also switch safely back to the CS isolation level. You might ask, what makes the real difference here? The difference is that the cursor is declared with the FOR UPDATE OF clause, which guides DB2 to use the U-lock in FETCH - and remember U-locks are not compatible.

So, if A gets U-lock when fetching from COUNTER (which will be eventually promoted to X-lock due to the update), B cannot proceed further until A commits and releases the lock, because U-locks are not compatible. Effectively, we have created a mutual exclusive section, and so the data corruption nor a deadlock cannot occur. What can happen though is that B gets a timeout if B is waiting for the lock too long because A is doing very long business in its step. However, that would be another story more related to proper commit strategy.

Please note that besides the FOR UPDATE OF clause, which is useful for cursors that will be used for positional updates, there is also a FOR FETCH ONLY clause that declares the intention for read only fetches. This helps DB2 to avoid ambiguous cursors and to acquire proper locks, which can reduce the lock contention of read only applications. It is a good practice to always code FOR FETCH ONLY or FOR UPDATE OF

  • depending on your application.

Update the counter first

The simple scenario we use in this article can be simply solved using a simple update at the beginning, which takes X lock directly from the start. That would suspend B until A commits.

-- update the next avail order and select the old value
SELECT NEXTORDER INTO :current_order FROM OLD TABLE
(UPDATE COUNTER SET NEXTORDER = NEXTORDER + 1);
-- insert a row
INSERT INTO ORDERS VALUES (:current_order, ...);
COMMIT;

Here we update the counter at the first place. If a concurrent activity just updated the value, we must wait until it commits - the X-lock acquired by an UPDATE will prevent surprises. Otherwise, if there is no X-lock, we can continue, update the counter, acquiring an X-lock, inserting a new row and committing. The key success of this solution is the SELECT FROM OLD TABLE, which updates the row and returns the original value (OLD TABLE) prior the update in a single statement! So, in fact we are back to the original question and it seems that the singleton SELECT is a perfect and probably most elegant solution here.

Use optimistic locking

Optimistic locking is another solution that came to my mind. Although it would require some changes - both in DDL and in the code. The idea behind the optimistic locking is having a timestamp (or a token) in the COUNTER table that will be updated automatically after each update. It means that in our code we could check if there was some update since we read the data in that table. The DDL would require the following change:

CREATE TABLE COUNTER
(NEXTORDER INTEGER,
LASTUPDATE NOT NULL GENERATED BY DEFAULT FOR EACH ROW ON UPDATE AS ROW
CHANGE TIMESTAMP
);

Since then, DB2 will automatically record any updates and will populate the LASTUPDATE column for us. The code however, needs to be changed a bit as well:

START:
-- select the avail order number and last update timestamp
SELECT NEXTORDER, LASTUPDATE INTO :current_order, :last_update FROM COUNTER;

-- update the counter only if there was no other update
UPDATE COUNTER SET NEXTORDER = NEXTORDER + 1 WHERE LASTUPDATE = :last_update;
-- in case of SQLCODE = +100 retry processing
if SQLCODE == 100
GOTO START;

-- insert the row
INSERT INTO ORDERS VALUES (:current_order, ...);
COMMIT;

In the singleton SELECT statement we query the LASTUPDATE column in addition to the NEXTORDER. Then, when we try to update the next available order number, we use the LASTUPDATED value in the update. If the value has not changed - no one else updated the value - we update the value and acquire an X-lock on that row or page. If there was any concurrent activity that updated the value after we selected but before we updated, the LASTUPDATE value would have changed. In such case, the UPDATE statement would end with SQLCODE +100, which means that a row that matches such criteria was not found. In this case, the application should perform a new select because the value of NEXTORDER has been updated.

The optimistic locking is usually recommended in the situations where you are optimistic that no one will update the value. If there is a great chance that the concurrent updates happen frequently, this is not a best solution and is usually better to rely on the DB2 locking.

Use sequences or identity column

The last option I will mention here is to dismiss the COUNTER table and use SQL SEQUENCE or IDENTITY COLUMN instead. It however does not work in all cases, as I mentioned in the beginning maybe the order number is not sequential and must conform to some rules guaranteed by the application. So as usual, it depends.

Conclusion

The intention of this article is to give you a heads-up that even simple scenarios can mean interesting problems when running in a concurrent environment. Concurrency control, locking, and lock avoidance techniques are very large and complicated topics and we touched just a tip of the iceberg.

Should you have any other tips how to deal with the scenarios discussed here or any additional comments, please share your ideas in the comments below.

Last, but not least, I would like to thank to Dan Luksetich for contributing to this article, especially to the section about isolation levels.

References

1 Like
Recent Stories
Managing Multiple DB2 Instances

Introduction to XML in DB2 11 for z/OS – Part 6 – XQuery basics

DB2 11 Fundamentals for z/OS