Merge - by Emil Kotrc

MERGE

Some time ago, I began discussing several features of various relational database management systems with my friends. One of the discussed topics was the MERGE SQL statement and the alternatives. MERGE is common in the popular commercial DBMS like DB2, Microsoft SQL Server, or Oracle. All of these DBMS implement the SQL standard of MERGE to a certain extent (there are two standards - 2003 and 2008) and add some extensions. The other DBMS (for example MySQL) usually implement non-standard version of MERGE, called UPSERT, INSERT ON DUPLICATE, or REPLACE INTO, but these implementations are out of the scope of this blog.

Let’s have a brief look at the MERGE statement implementation in DB2. As I am primarily working on DB2 for z/OS, I will focus on this flavor of DB2. However, if there are any significant differences in the MERGE implementation in DB2 LUW, please share them in comments.

Overview

In one sentence, MERGE statement can be seen as a combination of INSERT and UPDATE SQL statements (DB2 LUW supports DELETE as well). Imagine you want to INSERT or UPDATE a row depending on the existence of a row. If the row does not exist in the target table, you want to insert a new row, if the row already exists, you want to update it. You can implement this behavior either by using your application logic or by using the MERGE statement.

The MERGE statement was introduced in DB2 9. Its syntax seems a little bit complicated, but it is not that bad once you understand what you want to use MERGE for. Basically, you need few things:

●     target table is where you want to insert the new row to, or where you want to update a row (please note that MERGE supports even multi row operations, but let’s keep it simple)

●     source table, which you define within the MERGE statement, and this can be single or multiple rows table. You will use this source table to insert/update the rows from.

●     search condition that specifies the join condition for the target and source tables. If the condition is evaluated to true, MERGE will execute the WHEN MATCHED modification operation (UPDATE); if the condition is evaluated to false, MERGE will execute the WHEN NOT MATCHED operation (INSERT).

●     modification operation is either WHEN MATCHED operation that allows the UPDATE operation only; or WHEN NOT MATCHED that allows using the INSERT statement.

Piggybank example

Sometimes it is better to play with things using simple use cases. Are you saving money in piggy banks? Let’s say you have n secret slots with or without a piggy bank where you save coins for certain purposes. For example, the slot 1 can be for a new car (or a model), the slot 2 can be for a new book, etc. I’ll leave the final decision up to you.

Now, let’s say you have a coin in your hand and you want to put it in a randomly selected slot, let it be slot 5, but you are not sure if the piggy bank is in there. If the slot is empty (no value for the slot exists) you need to insert a new piggy bank first and insert the first coin inside. This will be our INSERT operation.

On the other hand if the piggy bank is already in there, a row for that slot exists and the value is greater than zero. By inserting a coin we will increase the slot’s value, meaning we update the piggy bank value as the old value plus the coin value. This will be our UPDATE operation.

(Yes, we could implement this scenario using just the UPDATE statement, but let’s have something simple to play with).

DDL for this example can be pretty simple. We can have just one table SAVINGS with three columns - the slot number, the value in it, and an optional brief description:

CREATE TABLE SAVINGS
(     SLOT        SMALLINT NOT NULL PRIMARY KEY,
      VALUE       DECIMAL(10,2) NOT NULL
      DESCRIPTION VARCHAR(128)
);

CREATE UNIQUE INDEX SAVINGS_SLOT ON SAVINGS(SLOT);

First, let do our example using the program logic. Assume the host variable slot contains the slot number we want to insert/update (we’ve mentioned slot number 5), let coin host variable contain a value of the coin, and for insert we also need a description on desc host variable.

Basically, we have three options: 1) do INSERT first, or 2) do UPDATE first; each statement followed by the other operation, or 3) run a SELECT followed by INSERT, or UPDATE.

The pseudo code for the first case could look like:

INSERT INTO SAVINGS VALUES(:slot, :coin, :desc);

if (SQLCODE == -803)
      //duplicate entry -> row already exists
      UPDATE SAVINGS SET VALUE = VALUE + :coin WHERE SLOT = :slot;

Endif

The logic is pretty simple, we just try to insert a new row and check the SQLCODE. If it is -803 (SQLSTATE = 23505), which means a duplicate entry, we know that there is already a piggy bank and we will update its value instead of inserting.

 

The second case could be coded like:

UPDATE SAVINGS SET VALUE = VALUE + :coin WHERE SLOT = :slot;

if (SQLCODE == 100)
          INSERT INTO SAVINGS VALUES(:slot, :coin, :desc);
Endif

 

In this case we try the UPDATE first. If no record has been updated, we get a warning and SQLCODE 100. In such case, we perform an INSERT.

Finally the last case could be as:

 SELECT VALUE FROM SAVINGS WHERE SLOT = :slot;

if (SQLCODE == 0)
            // row found 
            UPDATE SAVINGS SET VALUE = VALUE + :coin WHERE SLOT = :slot;
      else if (SQLCODE == 100)
            // row not found
            INSERT INTO SAVINGS VALUES(:slot, :coin, :desc)
Endif

 

All in all we need two DB2 calls to perform the correct operation, unless we have an oracle (with low o :-)) of course. How this logic would be coded using the MERGE statement? There would be just a single SQL statement:

      MERGE INTO SAVINGS AS TARGET
            USING (VALUES(:slot, :coin, :desc)) AS SOURCE(SLOT, VALUE, DESC)
      ON TARGET.SLOT = SOURCE.SLOT
      WHEN MATCHED THEN UPDATE SET VALUE = TARGET.VALUE+SOURCE.VALUE
      WHEN NOT MATCHED THEN INSERT (SLOT, VALUE, DESCRIPTION)
VALUES (SOURCE.SLOT, SOURCE.VALUE, SOURCE.DESC);

Where

●     the target table is SAVINGS as specified in the INTO clause and named as TARGET

●     the source table is defined in the USING clause and named as SOURCE, please note that this is a one row table

●     the search condition (also known as a join condition) is TARGET.SLOT = SOURCE.SLOT; i.e. we are trying to match the slot number in the target and source tables

●     the modification operations are UPDATE specified in the WHEN MATCHED branch; and INSERT specified in WHEN NOT matched.

The abovementioned MERGE statement can be rerun regardless the row for the specific slot exists or not without the need of any specific program logic.

And what if we wanted to withdraw all the money from a piggybank within a certain slot? We can just select the value and destroy the bank using the DELETE. Simply as:

      SELECT VALUE FROM OLD TABLE (DELETE FROM SAVINGS WHERE SLOT=:slot);

This neat feature is available since DB2 10, isn’t that nice?

Conclusion

Is it better to use the program logic or the MERGE statement? We are dealing with databases right? So the ultimate answer is: it depends. However, my personal preference is to use the MERGE statement whenever possible because it means fewer calls to DB2 and it simplifies the code logic even though the MERGE statement seems to be more complicated than separate INSERT and UPDATE. But the pros and cons might of course depend on your environment and/or preferences; however, at least you have another option to consider.

References

●     MERGE on wikipedia

●     MERGE for DB2 for z/OS

●     Inserting and updating rows in a single operation

 

Emil Kotrc
CA Technologies
Emil.Kotrc@ca.com

2 Likes
Recent Stories
Introduction to Db2 for z/OS System Profiles

10 questions to review and understand your active log data set configuration

DB2 for z/OS query support over FTP