Looking at the DB2 12 Enhanced Merge testing

Bankdata – Looking at the DB2 12 Enhanced Merge testing

Frank Petersen, Bankdata

I have had the pleasure of spending 2 weeks of a rainy Danish summer looking at the enhanced MERGE in DB2 12 as part of the DB2 12 Early Support Program (ESP). During the testing problems arose that made the testing take somehow longer than expected, but the problems ended up giving benefit even to the existing MERGE – as will be seen later.

The MERGE SQL syntax was implemented in DB2 z/OS V9 - which was GA around 2007. Apparently, the implementation that was built into V9 was not ANSI standard – or at least not adhering to the newest ANSI standard.

I took a check at the Bankdata site to see how the MERGE-statement has been adopted by the developers and therefore – indirectly – how well our standard-setters and educators have been able to broaden the knowledge of the statement within the organization.

The result was not encouraging as I found that we only have around 50 applications in the source-code library exploiting the MERGE. It seems that the reason for not using the statement more heavily at our site is based on not having the right business case in the application and the fact that internal education on MERGE did not start before around 2011.

In DB2 12 we get delivery of a new “version” of the MERGE or - more correctly – there will be delivered an Enhanced MERGE that will adhere to the current ANSI standard and thus will be delivering a lot of new functionality. Personally I expect that this will increase the candidate business cases, but (as we will see later) also that it will introduce new risks that we have to deal with – business as usual !

When looking at the new enhanced MERGE I started by deciding that I would not comment on the syntax and other aspects that are decided by the ANSI standard. A decision that I could not keep entirely – as seen later. Everyone can have an opinion on the syntax etc. but clearly we all have a mutual interest in DB2 adhering to standards to be able to take in applications written for other DBMS’s - but also to reduce the learning curve for new employees.

When I look at new/changed statements - like MERGE - there will be three natural items to focus upon (now that we accept the syntax as it is). These items are Performance – manifested in cool performance measurements and access path selections, Usability – manifested in savings in code hours/lines and in increased readability afterwards, meaning that it will be cheaper to build and maintain. The last item is a judgment of how it will fit into our specific installation, meaning whether it will introduce risks in the way we do things and whether we are likely to use it to such a large extent that it will bring some savings.

The changes to MERGE are summarized in the following :

  1. The “source” of the MERGE can now include table, view and fullselect
  2. Additional predicates in MATCHED and NOT MATCHED
  3. Can now do DELETE
  4. Can now do multiple UPDATE and INSERT (and DELETE)
  5. Can now accept IGNORE and SIGNAL actions
  6. A row can ONLY be operated in ONE U/I/D – In the V9 mutation a row could be operated multiple times (!!!)
  7. By not indicating NOT ATOMIC… is signaled that the enhanced MERGE is being used !

When I verify new functions and features like this, I always start by finding/building a candidate scenario to run the tests upon. The target tables have to have a reasonable size to be able to judge performance but also a manageable size when it comes to the needed resetting of the involved tables to a known state to do repeated tests.

Let me state right away that the scenario turned out to be far more complex and time consuming than I expected. As a matter-of-fact, it was on the threshold of what my head can cope with :-). This will also mean that this document probably never will be read by anyone further than to page two as it is not for the faint-hearted !!  Nevertheless, let us try :

 I have thrown my love on two tables that we have for our performance datawarehouse :

  •  Table 1 (47 M rows) contains all statement metrics captured from the IFCID 401. It contains a row per program_name, per statement-id, per day with all that days worth of performance metrics in separate columns.
  •  Table 2 (14 M rows) contains performance metrics but from standard DB2 accounting which does not take us down to statement level. Meaning it contains a row per program, per collection, per day with all that days accounting performance metrics summarized.

 The scenario that I try to build is purely theoretical for the purpose of verifying the enhanced MERGE - but I could see a need to MERGE the two table in this way :

 For every row matching on program_name , Date and DB2-subsystemID:

  1. If the row is found in both tables take a few fields (CPU and Elapsed time metrics) from Table 2 (accounting) and UPDATE them into some vacant columns in table 1 (Statement metrics).
  2. If the row is in Table 2 (accounting) but not in Table 1 (Statement metrics) INSERT a row in Table 1.
  3. If, however, the elapsed time and CPU time is less than 0.5 second afterwards DELETE the row.

The result will be a Table 1 (Statement metrics) that for every statement in a given package will contain the total CPU and Elapsed time for the entire program. At the same time it will give an indication of how many packages, we have accounting information for (if the new fields contains value > 0). It will also give us programs where we have not captured any IFCID401 (the rows INSERT’ed). This will signal that something was not been captured by the IFI-program (“Trace Data Lost” or similar). And finally, we will at the same time clean up any programs that use an insignificant amount of CPU/Elapsed-time.

Before DB2 12, we could not use a simple piece of SQL to handle this requirement.

We would either have to write an application that travels through Table 2 and do the UPDATE on matching rows. If no match found we would have to write an INSERT and finally we would have to write a DELETE for the rows with too little resource consumption.

We could also - before DB2 12 –traveled through Table 2 and feed a MERGE for every VALUE - found in Table 1 - that would then handle the UPDATE and INSERT and we would have to do a DELETE for the rows having too little resource consumption.

We will build a scenario around this situation - and illustrate it with SQL to try to understand if we can do it entirely in SQL after DB2 12, or if we will have to write an application to solve the problem.

The scenarios will be like these :

  1. Use “before-DB2 12” functionality with MERGE (with NOT ATOMIC syntax to signal not enhanced). We run on VALUES derived from a separate single-row cursor on Table 2 and the DELETE will run on a separate single-row cursor to allow us to do COMMITs during the DELETE’s.
  2. Use “before-DB2 12” functionality with MERGE using multi row. We run on VALUES derived from a separate multi-row cursor on Table 2 (ROWSET=1000) and the DELETE will run on a separate multi-row cursor to allow us to do COMMITs during the DELETE’s.
  3. Use DB2 12 ENHANCED MERGE using input from Table 2 and needed application logic to INSERT and UPDATE. This will save us from doing the separate CURSOR on Table 2 but we will do the DELETE on a separate single-row cursor to allow us to do COMMITs during the DELETE’s.
  4. Use DB2 12 ENHANCED MERGE using input from Table 2 and needed application logic to INSERT and UPDATE and DELETE as well. This will save us from doing the separate CURSOR on Table 2 and save us from doing a cursor DELETE afterwards.
  5. Out of scope I will investigated these variations :
    1. Verify if we can do several U/I/D on same row using multiple WHEN MATCHED
    2. Verify the IGNORE option
    3. Verify the SIGNAL
    4. Verify the SELECT .. FROM FINAL TABLE

 I will build these four variations into programs to judge access path selections, static performance and the amount of code work needed.

 The tables were initialized to a known state and image-copied and verified that we could reset them to this state with a RECOVER TO utility.

 After setting the stage, the tables are verified by selecting one particular date to be able to ensure that the correct processing is taking place.

 I will not bore anyone (still with me at this point) with details on this but the idea was to take different SUM’s and COUNT’s and use this for every variation to try to ensure that the processing is similar.

 We had some old observations that it seemed difficult to get DB2 to use a good access path in the MERGE in all cases. As this was assumed not to be related to DB2 12, it was not expected to be investigated further – only if I saw similar problems in the enhanced MERGE I would open up this issue. This turned out to be very interesting !!

Scenario 1 – “Pre DB2 12” MERGE on single row VALUE / one UPDATE WHEN MATCHED / separate DELETE :

The SQL used :

Cursor to feed values into the MERGE :

SELECT SUBSTR(PROGRAM_NAME,1,12), DB2ID, START_TIMESTAMP 
,DATE(START_TIMESTAMP)
,SQL_REQUESTS
,ELAPSED_TIME
,CPU_TIME
FROM CANDLE.H2ACCTPKGJSKFAP
WHERE START_TIMESTAMP > '2016-06-01-00.00.00.000000'
AND DB2ID = 'DB2B'
AND SYSID = 'SYSJ'
AND CONNID = 'CICS'

The MERGE done per VALUE :

 MERGE INTO CANDLE.H2STATEMNTMONI STMT 
USING (
VALUES (
:MERGE1-PROGRAM-NAME
,:MERGE1-DB2ID
,:MERGE1-TIMESTAMP
,:MERGE1-SQLREQ
,:MERGE1-ELATIME
,:MERGE1-CPUTIME
)
)
AS INP(
PROGRAM_NAME
,DB2ID
,READS_TIMESTAMP
,READS_NMB_EXEC
,READS_ELA_TIME
,READS_CPU_TIME
)
ON
(
STMT.PROGRAM_NAME =
INP.PROGRAM_NAME
AND
DATE(STMT.READS_TIMESTAMP) = DATE(INP.READS_TIMESTAMP)
AND
STMT.DB2ID = INP.DB2ID
AND
STMT.COLLID IN ('TESTB','TESTB TESTB')
AND STMT.REC_ID = 'I'
)
WHEN MATCHED THEN
UPDATE SET
STMT.READA_CPU_TIME = INP.READS_CPU_TIME
,STMT.READA_ELA_TIME = INP.READS_ELA_TIME
,STMT.READA_NMB_EXEC = INP.READS_NMB_EXEC
WHEN NOT MATCHED THEN
INSERT (
DB2ID ,
COLLID ,
PROGRAM_NAME ,
STMT_ID ,
REC_ID ,
READS_TIMESTAMP ,
READS_TIMEFROM ,
READA_CPU_TIME ,
READA_ELA_TIME ,
READS_CPU_TIME ,
READS_ELA_TIME ,
READS_NMB_EXEC
)
VALUES (
:MERGE1-DB2ID,
'TESTB TESTB',
:MERGE1-PROGRAM-NAME,
47114711,
'MI',
:MERGE1-TIMESTAMP,
:MERGE1-TIMESTAMP,
:MERGE1-CPUTIME,
:MERGE1-ELATIME,
:MERGE1-CPUTIME,
:MERGE1-ELATIME,
:MERGE1-SQLREQ)
NOT ATOMIC CONTINUE ON SQLEXCEPTION

Cursor to do DELETE :  

SELECT 
PROGRAM_NAME
,DB2ID
,READS_TIMESTAMP
,READS_NMB_EXEC
,READS_ELA_TIME
,READS_CPU_TIME
FROM
CANDLE.H2STATEMNTMONI
WHERE DATE(READS_TIMESTAMP) > '2016-06-01'
AND DB2ID = 'DB2B'
AND (
(
REC_ID = 'MI'
AND COLLID = 'TESTB TESTB'
AND READS_ELA_TIME < 0.5
AND READS_CPU_TIME < 0.5)
OR
(READA_ELA_TIME < 0.5
AND READA_CPU_TIME < 0.5
AND READA_ELA_TIME > 0
AND READA_CPU_TIME > 0 )
)
DELETE FROM CANDLE.H2STATEMNTMONI WHERE
CURRENT OF MERGE-CLEAN-CURSOR1
 

 The FETCH-MERGE execution used 125 CPU seconds to merge 130225 rows in 82406 physical MERGE’s.

 I was surprised that the the “old” MERGE was so intensive in usage of CPU-time, and it turned out that the MERGE was not able to use my index containing an “Index-on-Expression” that I had built. I validated that this was a problem for the Enhanced MERGE as well, so I opened a PMR towards IBM resulting in APAR PI69414. This was where the testing was suspended for months while waiting for a fix. In short, it turned out that the support for Index-on-Expression indexes was simply forgotten by DB2 development for the two MERGE variations. The resulting fix for DB2 12 can mean a large CPU-saving if you are using Index-on-Expression – even when using the “old” MERGE.

After the fix was implemented, the CPU-time for the FETCH-MERGE went down from 125 CPU seconds to 5 CPU seconds !!!

The DELETE done afterwards in a separate run took 9 CPU seconds for deleting 112305 rows in just as many physical DELETES.

Scenario 2 – “Pre DB2 12” MERGE on multi row VALUE / one UPDATE WHEN MATCHED / separate DELETE :

The SQL used :

Cursor to feed values into the MERGE :

WITH ROWSET POSITIONING FOR <----- 1000 as ROWSET 
SELECT SUBSTR(PROGRAM_NAME,1,12), DB2ID, START_TIMESTAMP
,DATE(START_TIMESTAMP)
,SQL_REQUESTS
,ELAPSED_TIME
,CPU_TIME
FROM CANDLE.H2ACCTPKGJSKFAP
WHERE START_TIMESTAMP > '2016-06-01-00.00.00.000000'
AND DB2ID = 'DB2B'
AND SYSID = 'SYSJ'
AND CONNID = 'CICS'

The MERGE done per ROWSET VALUE :

 MERGE INTO CANDLE.H2STATEMNTMONI STMT 
USING (
VALUES (
:MERGE2-PROGRAM-NAME-M
,:MERGE2-DB2ID-M
,:MERGE2-TIMESTAMP-M
,:MERGE2-SQLREQ-M
,:MERGE2-ELATIME-M
,:MERGE2-CPUTIME-M
)
FOR :MULTI-CNT-WRK ROWS
)
AS INP(
PROGRAM_NAME
,DB2ID
,READS_TIMESTAMP
,READS_NMB_EXEC
,READS_ELA_TIME
,READS_CPU_TIME
)
ON
(
STMT.PROGRAM_NAME =
INP.PROGRAM_NAME
AND
DATE(STMT.READS_TIMESTAMP) = DATE(INP.READS_TIMESTAMP)
AND
STMT.DB2ID = INP.DB2ID
AND
STMT.COLLID IN ('TESTB','TESTB TESTB')
AND STMT.REC_ID = 'I'
)
WHEN MATCHED THEN
UPDATE SET
STMT.READA_CPU_TIME = INP.READS_CPU_TIME
,STMT.READA_ELA_TIME = INP.READS_ELA_TIME
,STMT.READA_NMB_EXEC = INP.READS_NMB_EXEC
WHEN NOT MATCHED THEN
INSERT (
DB2ID ,
COLLID ,
PROGRAM_NAME ,
STMT_ID ,
REC_ID ,
READS_TIMESTAMP ,
READS_TIMEFROM ,
READA_CPU_TIME ,
READA_ELA_TIME ,
READS_CPU_TIME ,
READS_ELA_TIME ,
READS_NMB_EXEC
)
VALUES (
:MERGE2-DB2ID-M,
'TESTB TESTB',
:MERGE2-PROGRAM-NAME-M,
47114711,
'MI',
:MERGE2-TIMESTAMP-M,
:MERGE2-TIMESTAMP-M,
:MERGE2-CPUTIME-M,
:MERGE2-ELATIME-M,
:MERGE2-CPUTIME-M,
:MERGE2-ELATIME-M,
:MERGE2-SQLREQ-M)
NOT ATOMIC CONTINUE ON SQLEXCEPTION

Cursor to do DELETE :

WITH ROWSET POSITIONING FOR <---- ROWSET = 1000
SELECT
PROGRAM_NAME
,DB2ID
,READS_TIMESTAMP
,READS_NMB_EXEC
,READS_ELA_TIME
,READS_CPU_TIME
FROM
CANDLE.H2STATEMNTMONI
WHERE DATE(READS_TIMESTAMP) > '2016-06-01'
AND DB2ID = 'DB2B'
AND (
(
REC_ID = 'MI'
AND COLLID = 'TESTB TESTB'
AND READS_ELA_TIME < 0.5
AND READS_CPU_TIME < 0.5)
OR
(READA_ELA_TIME < 0.5
AND READA_CPU_TIME < 0.5
AND READA_ELA_TIME > 0
AND READA_CPU_TIME > 0 )
)
DELETE FROM CANDLE.H2STATEMNTMONI WHERE
CURRENT OF MERGE-CLEAN-CURSOR2
 

The MULTIROW FETCH-MERGE execution merged 130225 rows in 83 physical MERGE’s using 5 CPU seconds.

The DELETE done afterwards in a separate run took 8 CPU seconds for deleting 112305 rows in 112 physical DELETES.

Scenario 3 – DB2 12 MERGE on SELECT from Table 2 / one UPDATE WHEN MATCHED / separate DELETE :

 The SQL used :

 Combined SELECT and MERGE :

 MERGE INTO CANDLE.H2STATEMNTMONI STMT 
USING (
SELECT PROGRAM_NAME , DB2ID, START_TIMESTAMP
,DATE(START_TIMESTAMP)
,SQL_REQUESTS
,ELAPSED_TIME
,CPU_TIME
FROM CANDLE.H2ACCTPKGJSKFAP
WHERE START_TIMESTAMP > '2016-06-01-00.00.00.000000'
AND DB2ID = 'DB2B'
AND SYSID = 'SYSJ'
AND CONNID = 'CICS')
AS INP
ON
(
STMT.PROGRAM_NAME =
SUBSTR(INP.PROGRAM_NAME,1,12)
AND
DATE(STMT.READS_TIMESTAMP) =
DATE(INP.START_TIMESTAMP)
AND
STMT.DB2ID = INP.DB2ID
AND
STMT.COLLID IN ('TESTB','TESTB TESTB')
AND STMT.REC_ID = 'I'
)
WHEN MATCHED THEN
UPDATE SET
STMT.READA_CPU_TIME = INP.CPU_TIME
,STMT.READA_ELA_TIME = INP.ELAPSED_TIME
,STMT.READA_NMB_EXEC = INP.SQL_REQUESTS
,STMT.REC_ID = 'MU'
WHEN NOT MATCHED THEN
INSERT (
DB2ID ,
COLLID ,
PROGRAM_NAME ,
STMT_ID ,
REC_ID ,
READS_TIMESTAMP ,
READS_TIMEFROM ,
READA_CPU_TIME ,
READA_ELA_TIME ,
READS_CPU_TIME ,
READS_ELA_TIME ,
READS_NMB_EXEC
)
VALUES (
INP.DB2ID,
'TESTB TESTB',
SUBSTR(INP.PROGRAM_NAME,1,12),
47114711,
'MI',
INP.START_TIMESTAMP,
INP.START_TIMESTAMP,
INP.CPU_TIME,
INP.ELAPSED_TIME,
INP.CPU_TIME,
INP.ELAPSED_TIME,
INP.SQL_REQUESTS)

  Cursor to do DELETE :

  SELECT 
PROGRAM_NAME
,DB2ID
,READS_TIMESTAMP
,READS_NMB_EXEC
,READS_ELA_TIME
,READS_CPU_TIME
FROM
CANDLE.H2STATEMNTMONI
WHERE DATE(READS_TIMESTAMP) > '2016-06-01'
AND DB2ID = 'DB2B'
AND (
(
REC_ID = 'MI'
AND COLLID = 'TESTB TESTB'
AND READS_ELA_TIME < 0.5
AND READS_CPU_TIME < 0.5)
OR
(READA_ELA_TIME < 0.5
AND READA_CPU_TIME < 0.5
AND READA_ELA_TIME > 0
AND READA_CPU_TIME > 0 )
)
DELETE FROM CANDLE.H2STATEMNTMONI WHERE
CURRENT OF MERGE-CLEAN-CURSOR3  

The enhanced MERGE execution merged 130225 rows in 1 physical MERGE’s using 4 CPU seconds.

The DELETE done afterwards in a separate run took 8 CPU seconds for deleting 112305 rows in just as many physical DELETES.

Scenario 4 – DB2 12 MERGE on SELECT from Table 2  / UPDATE’s WHEN MATCHED, / DELETE in the MERGE :

The SQL used :

Combined SELECT and MERGE with DELETE:

  MERGE INTO CANDLE.H2STATEMNTMONI STMT 
USING (
SELECT PROGRAM_NAME , DB2ID, START_TIMESTAMP
,DATE(START_TIMESTAMP)
,SQL_REQUESTS
,ELAPSED_TIME
,CPU_TIME
FROM CANDLE.H2ACCTPKGJSKFAP
WHERE START_TIMESTAMP > '2016-06-01-00.00.00.000000'
AND DB2ID = 'DB2B'
AND SYSID = 'SYSJ'
AND CONNID = 'CICS')
AS INP
ON
(
STMT.PROGRAM_NAME =
SUBSTR(INP.PROGRAM_NAME,1,12)
AND
DATE(STMT.READS_TIMESTAMP) =
DATE(INP.START_TIMESTAMP)
AND
STMT.DB2ID = INP.DB2ID
AND
STMT.COLLID IN ('TESTB','TESTB TESTB')
AND STMT.REC_ID = 'I'
)
WHEN MATCHED AND
( INP.CPU_TIME >= 0.5 OR
INP.CPU_TIME <= 0 )
OR
( INP.ELAPSED_TIME >= 0.5 OR
INP.ELAPSED_TIME <= 0 )
--
-- TO ENSURE WE ONLY UPDATE ROWS WITH HIGH CONSUMPSION
--
THEN
UPDATE SET
STMT.READA_CPU_TIME = INP.CPU_TIME
,STMT.READA_ELA_TIME = INP.ELAPSED_TIME
,STMT.READA_NMB_EXEC = INP.SQL_REQUESTS
,STMT.REC_ID = 'MU'
WHEN MATCHED
AND
NOT (( INP.CPU_TIME >= 0.5 OR
INP.CPU_TIME <= 0 )
OR
( INP.ELAPSED_TIME >= 0.5 OR
INP.ELAPSED_TIME <= 0 ))
--
-- THE NOT CLAUSE ABOVE ENSURES WE ONLY DELETE ROWS
-- WITH LOW CONCUMPSION
--
THEN
DELETE
WHEN NOT MATCHED AND
( INP.CPU_TIME >= 0.5 OR
INP.CPU_TIME <= 0 )
OR
( INP.ELAPSED_TIME >= 0.5 OR
INP.ELAPSED_TIME <= 0 )
--
-- TO ENSURE WE ONLY INSERT ROWS WITH HIGH CONSUMPSION
--
THEN
INSERT (
DB2ID ,
COLLID ,
PROGRAM_NAME ,
STMT_ID ,
REC_ID ,
READS_TIMESTAMP ,
READS_TIMEFROM ,
READA_CPU_TIME ,
READA_ELA_TIME ,
READS_CPU_TIME ,
READS_ELA_TIME ,
READS_NMB_EXEC
)
VALUES (
INP.DB2ID,
'TESTB TESTB',
SUBSTR(INP.PROGRAM_NAME,1,12),
47114711,
'MI',
INP.START_TIMESTAMP,
INP.START_TIMESTAMP,
INP.CPU_TIME,
INP.ELAPSED_TIME,
INP.CPU_TIME,
INP.ELAPSED_TIME,
INP.SQL_REQUESTS)

This version of the MERGE executed in 9 CPU seconds. It manipulated a little less rows than the other 2 examples and this is because we avoid doing some inserts that we (in the other scenarios) would have to delete later!

Please notice that the conditions tend to become pretty complex to meet the rule in the MERGE that it can only manipulate a given row once (ATOMIC). I struggled a bit getting this right and I could see that in some cases it might be difficult to ensure that it works as desired. As seen above – and this might even be a simple scenario – the conditions does nothing good for the syntax - and the testing will also be more complicated. It would have been easier to write the SQL if the MATCHED-clauses were applied in sequence even if the row was updated previously as we could have avoided most of the conditions on the UPDATE parts of the WHEN MATCHED – and only used them on the DELETE part.

My conclusion is therefore that if the business need (like in this application) dictates that a row can both be updated (perhaps multiple times) and deleted based upon the updated value, it will make the MERGE more complicated to build, read and maintain.

I will also claim that “converting” a “pre DB2 12” MERGE (NOT ATMIC) to a DB2 12 enhanced MERGE (ATOMIC) might not be trivial if rows were updated multiple times in the old variation.

All these 4 variations were executed and the performance figures plotted – please remember that the scenario 4 manipulates fewer rows that the other scenarios. But the variation is so small that it should not make a measurable difference.

 merge1.png

 

 During all these tests I also looked at the options as indicated earlier in the document : 

  1. It was verified that a row can only be “touched” once. Meaning that you cannot update a row by several UPDATEs nor DELETEs. IBM have confirmed that this is dictated by ANSI standard but I had the need right away in my scenario – but could not do it. I would have loved to UPDATE a given row and later do a WHEN MATCHED AND “something” to DELETE the row anyway. Please see my comments concerning ANSI standard under the bullet “I would have loved”**
  2. The phrase ELSE IGNORE was verified and tested
  3. The phrase SIGNAL was verified. I got myself a big surprise here because it seems from my testing that if I update a row in a WHEN MATCHED conditions, I would have expected I could do another WHEN MATCHED AND “something” later in the MERGE that could throw a SIGNAL like “SIGNAL SQLSTATE ‘70002’ SET MESSAGE_TEXT = “UPDATED VALUE WAS TOO HIGH”. IBM have confirmed that this is ANSI standard. Please see my comments concerning ANSI standard under the bullet “I would have loved”**
  4. The possibility of doing SELECT … FROM FINAL TABLE needs a chapter for itself.

 General comments about Enhanced MERGE and COMMIT.

 After working with the enhanced MERGE, it is clear that this is an extremely powerful statement. Though a few restrictions limit it a bit, I will claim that several applications can now be contained in one MERGE-statement.

 I take this as positive seen from an application point of view when looking at time to develop and easiness to read the application afterwards. However, when looking from an operations and DBA point of view, things are a bit more complex.

 In my scenario, I can now write my application – more or less – in one single statement that manipulates millions of rows. The problem is that I can see this will imply problems when we talk about COMMITs and hereby the number of LOCKs that can be taken on a table at a given point of time. Particular in batch-programming we use cursors WITH HOLD and call our COMMIT-control system for every n’th row to ensure concurrency on the objects. If someone now uses an enhanced MERGE that goes into DB2 and manipulates millions of rows chances are that the table will escalate its locks - and other locks that the application has done before the MERGE will be held for a very long time. In addition, should the statement eventually ROLLBACK this will also risk giving operational problems for the availability for our systems. Application people are not always able to judge how many rows a statement can process in production – now and in the future. Therefore, the risk for huge enhanced MERGE’s resulting in contention problems is indeed real !

From an operational point of view, we need to ensure that the MERGE is “sliced” into smaller units. It is exactly the same discussion that led to DB2 12 delivering the PieceMeal DELETE.

When we look at the way this can be achieved, it is possible by adding the FETCH FIRST n ROWS ONLY to the “inner” SELECT.   This will ensure that a known number of rows enter the MERGE processing – but as this is UPDATE’s/INSERT’s and DELETE’s it can be difficult to build a WHERE clause that will ensure that the next execution for the next set of rows will start when the prior one stopped.

It seems that the DB2 12 PAGINATION is not supported as this might have helped, so we are back to the fact that we can break the MERGE after n rows but we have problems “restarting” it.

We can embed the MERGE with a SELECT … FROM FINAL TABLE (MERGE….) and in this way, we could catch the variable that we could use in the WHERE-clause for the next iteration of the MERGE. Therefore, it could be a scenario like this :

MOVE ‘0001-01-01-00.00.00.000000’ to WORK-TIMESTAMP

DO UNTIL NO MORE ROWS
SELECT MY_TIMESTAMP_COL FROM FINAL TABLE (
MERGE INTO table TABLE1 T1 USING (
SELECT MY_TIMESTAMP1_COL from TABLE2
WHERE MY_TIMESTAMP_COL > :WORK-TIMESTAMP
FETCH FIRST 1000 ROWS ONLY) AS T2
ON (T1.MY_TIMESTAMP_COL = T2.MY_TIMESTAMP_COL)
WHEN MATCHED UPDATE SOMETHING
WHEN NOT MATCHED INSERT SOMETHING
MOVE MY_TIMESTAMP_COL TO WORK-TIMESTAMP

COMMIT
END DO
 

Or even better :

MOVE ‘0001-01-01-00.00.00.000000’ to WORK-TIMESTAMP

DO UNTIL NO MORE ROWS
SELECT MAX(MY_TIMESTAMP_COL) FROM FINAL TABLE (
MERGE INTO table TABLE1 USING (
SELECT MY_TIMESTAMP1_COL from TABLE2
WHERE MY_TIMESTAMP_COL > :WORK-TIMESTAMP
FETCH FIRST 1000 ROWS ONLY) AS INP
ON (TABLE1.MY_TIMESTAMP_COL = TABLE2.MY_TIMESTAMP_COL)
WHEN MATCHED UPDATE SOMETHING
WHEN NOT MATCHED INSERT SOMETHING
MOVE MAX(MY_TIMESTAMP_COL) TO WORK-TIMESTAMP
COMMIT
END DO
 

This will in fact work – BUT the problem is not solved - because the SELECT … FROM FINAL TABLE syntax is rejected if there is a DELETE imbedded in the MERGE.

Therefore, I see no solution to this potential problem and I am certain that this will be a problem that will “hit” many installations in the oncoming years if the enhanced MERGE becomes popular.

It surprised me very much that I found myself in this situation where it is difficult to see any smart solution. I had hoped it was allowed to have a “SELECT INTO” if there is a DELETE in the statement. I can see a discussion if the deleted rows should appear in the result set - but for the discussion of the restart it really does not matter as they will not be found anyway on the next execution of the next portion of the MERGE. At least the SELECT MAX/MIN variation should be possible…. IBM have confirmed that this is indeed ANSI standard. Please see my comments concerning ANSI standard under the bullet “I would have loved”**

Even if we do get this functionality there will be a risk of problems as the application-people will have to use this looping techniques – but at least it would then be possible to slice the MERGE into pieces surrounded by COMMIT. And the SELECT.. FROM FINAL TABLE would be comparable with other statements.

I would have loved :

 It is impressive that the “only” problems I found when verifying the Enhanced MERGE in DB2 12 are all issues dictated by the ANSI standard. My comments here are therefore more aimed at the work in the ANSI standard than towards IBM DB2. As I wrote earlier, we do want DB2 to adhere to the standards to be portable. Still I feel that the issues are quite important.

 So I would have loved : 

  1. That a row could both be UPDATED in a WHEN MATCHED and DELETED in another WHEN MATCHED.
  2. That is would be possible to do a SIGNAL on a WHEN MATCHED even though the same row was manipulated in another WHEN MATCHED
  3. I fear problems with “runaway” MERGE’s. If it would be possible to allow SELECT .. FROM FINAL TABLE even if the MERGE contains a DELETE we can live with it. The same applies if SQL Pagination was supported on the MERGE.

The conclusion :

The enhanced MERGE seems to work as designed. It is however suddenly a very powerful statement that will be able to do a lot of processing.

The performance – all else equal – is much better doing an enhanced MERGE compared to the processing needed today. Simply the fact that not all the data rows need to be returned cross-memory to the application gives a huge saving as seen above.

It will be easier to write an application (and to port an application) that suits the enhanced MERGE. And the readability afterwards is very much improved – no matter the programming language !

The statements under the bullet “I would have loved” covers some concerns that I have and they are in fact worrying me pretty much.

As seen above they will all (unfortunately) require changes to the ANSI standard and are therefore beyond reach currently – as we surely need to adhere to the ANSI standard to ensure portability of the code !!!

Nevertheless, the statement - as it is laid out for us - is impressively error free and will be very welcome. I am sure it will make the MERGE more popular at Bankdata – if we can control the COMMIT-frequency !!!

In general, we had an extremely positive cooperation with IBM and the other customers in the DB2 12 ESP-project. I think the user-involvement and the openness that DB2 has been offering in the latest years is one of DB2’s big strength in the database market. The user-communities have a big influence on what is being delivered and how - and I expect that the new continuous delivery model will give us more features and on a much shorter time line – to mutual benefit !!

 

 

 

Recent Stories
Db2 for z/OS locking basics

A Beginners Guide to Locks, Latches, Claims and Drains in DB2 for z/OS by Steve Thomas

Fun with Date Ranges