Blogs

Understanding the DB2 LUW Backup Performance Statistics

By Dale McInnis posted Aug 17, 2016 07:26 AM

  

Understanding DB2’s Backup and Restore Statistics

In an effort to help customers determine where performance bottlenecks exist in DB2 backup and restore operations, DB2 has added an extremely useful entry into the db2diag.log file. It was first introduced in v 9.7 under registry variable control (DB2_BAR_STATS), and is now enabled by default as of 10.1 FP2 (the registry variable is no longer required).

Each successful backup and restore operation generates a single record in the db2diag.log file, which provides information on the performance of that operation. The log record is informational and is written at diaglevel 3 (the default) as well as diaglevel 4. With the level of detail provided, one can get a very clear picture of where each thread spent its time while the operation was running.  An example of a backup performance statistics db2diag.log entry is as follows:

2016-01-20-11.07.26.882298-300 E243498A1912         LEVEL: Info
PID     : 11927628             TID : 2058           PROC : db2sysc
INSTANCE: dmcinnis             NODE : 000           DB   : PERF
APPHDL  : 0-28                 APPID: *LOCAL.dmcinnis.160120154344
AUTHID  : DMCINNIS             HOSTNAME: db2hadev
EDUID   : 2058                 EDUNAME: db2agent (PERF)
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 1414 bytes
 
Parallelism       = 10
Number of buffers = 10
Buffer size       = 10489856 (2561 4kB pages)
                                                                 
BM#    Total     I/O       MsgQ        WaitQ      Buffers   MBytes
---        --------   --------   --------       --------    --------      --------
000   1420.79     27.63   1269.31    122.62         789      7860
001   1420.74     53.90   1363.71       0.45        1683     16830
002   1420.74     51.27   1331.79     35.27        1508     15074
003   1420.74     39.19   1301.87     77.89        1118     11171
004   1420.74     32.95   1280.14    106.14         951      9507
005   1420.74     39.45   1223.28    156.55         932      9311
006   1420.74     31.31   1214.64    173.34         912      9114
007   1420.74     44.01   1260.33    114.55        1162     11615
008   1420.74     37.83   1246.14    135.36         897      8964
009   1420.74     27.64   1242.88    149.02         768      7676
---       --------                   --------  --------       --------     --------    -------
TOT  14207.48    385.25  12734.14   1071.23   10720    107125
 
MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---        --------    --------  --------     --------    --------     --------
000   1421.51   1420.33      0.41      0.00       10721    107231
---        --------    --------      --------  --------     --------     --------
TOT   1421.51   1420.33      0.41      0.00       10721    107231
 
The command issued to generate this db2diag.log entry was

           

            backup db perf use TSM

 

The meanings of the various columns for the db2 utility performance statistics are as follows:

DM1b.JPG

 

How can this information be interpreted?

By using the I/O, MsgQ and WaitQ values we can easily determine where the operations are spending most of their time. The formula used is I/O divided by total time to determine what percentage of time was spent performing I/O, MsgQ divided by total time to determine what percentage of time was spent waiting for buffers, and finally, WaitQ divided by total time to determine what percentage of time was spent waiting for control messages.  A simple spreadsheet allows for the data to be easily interpreted.  In the case above the spreadsheet would look like this:

 DM2.jpg

What does this data tell us?

This example shows that the average amount of time each BM thread spent waiting for data to be read from the database was 2.71% whereas these same threads spent 89.63% of the time waiting for buffers.  We also see that MC threads were spending on average, 99.92% of the time waiting for I/O (in this case for TSM to respond).  Since the MC can’t free up a buffer until TSM confirms it has been written and the BMs have to wait for the MCs to free the buffers, we can conclude that the bottleneck is in writing to the target device.  Based on the backup start and backup stop db2diag.log entries we know that the overall elapsed time for the backup was 23 minutes 41 seconds. 

How can you use this information to improve backup performance?

In the previous example there are two approaches that could be used to improve performance of backup: allocating more buffers so the BM threads do not have to wait as long or increasing the number of target devices (in this case, provide more TSM parallel sessions).  Let’s look at the effect of adding one additional TSM session which could be done by executing a command that looks like this:

           

            backup db perf use TSM open 2 sessions

 

Backup performance statistics recorded in the db2diag.log file would now look something like this:

 

2016-01-20-11.18.08.651897-300 E252008A2571         LEVEL: Info
PID     : 11927628             TID : 16557          PROC : db2sysc
INSTANCE: dmcinnis             NODE : 000           DB   : PERF
APPHDL  : 0-43                 APPID: *LOCAL.dmcinnis.160120160849
AUTHID  : DMCINNIS             HOSTNAME: db2hadev
EDUID   : 16557                EDUNAME: db2agent (PERF)
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 2073 bytes
 
Parallelism       = 19
Number of buffers = 19
Buffer size       = 5246976 (1281 4kB pages)
                                                                
BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---        --------  --------  --------     --------       --------     --------
000    557.12      0.15      0.14    556.81           4         0
001    556.66     88.66    464.83      0.45        3366     16830
002    556.66     86.43    459.99      7.80        3015     15074
003    556.66     68.08    435.45     51.31        2235     11171
004    556.66     58.99    416.10     80.02        1902      9507
005    556.66     61.79    412.72     80.62        1863      9311
006    556.66     58.42    406.24     90.53        1796      8979
007    556.66     54.30    393.00    107.95        1705      8521
008    556.66     49.11    357.02    149.31        1488      7435
009    556.66     33.14    286.32    236.34        1048      5236
010    556.66     27.67    256.55    271.71         882      4406
011    556.66     17.47    192.77    345.92         598      2985
012    556.66     14.87    163.41    377.97         489      2440
013    556.66      9.95    114.45    431.98         329      1643
014    556.66      8.44    101.43    446.52         306      1529
015    556.66      7.99    103.17    445.25         291      1450
016    556.66      3.05     31.20    522.32          94       469
017    556.66      1.16      8.63    546.84          27       134
018    556.66      0.13      0.46    556.06           1         0
---       --------      --------     --------      --------       --------    --------
TOT  10577.00    649.89   4603.99   5305.79       21439    107125
 
MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---       --------  --------     --------   --------      --------    --------
000    557.12    553.32      3.01      0.01       10628     53176
001    557.14    553.07      2.85      0.39       10813     54097
---        --------     --------    --------  --------     --------     --------
TOT   1114.26   1106.39      5.87      0.41       21441    107273
 
When we put these numbers in a spreadsheet, we get:

 

 DM3.jpg

DM3a.jpg

 

The elapsed time decreased to 9 minutes 17 seconds and the average time the BM threads spent waiting for buffers decreased to 43.53%.   This is definitely an improvement, however the BM threads are still spending a lot of time waiting for buffers.  One approach that could provide some relief is to further increase the number of TSM sessions.

 

Does increasing the number of target sessions always help?

 

We can attempt to increase the number of sessions further to see if it will lower the overall elapsed time ever further.  The command used to do this is:

           

            backup db perf use TSM open 4 sessions

 

Now, backup performance statistics recorded in the db2diag.log file might look like this:

 

2016-01-20-12.02.38.004368-300 E261177A2703         LEVEL: Info
PID     : 11927628             TID : 16814          PROC : db2sysc
INSTANCE: dmcinnis             NODE : 000           DB   : PERF
APPHDL  : 0-56                 APPID: *LOCAL.dmcinnis.160120164909
AUTHID  : DMCINNIS             HOSTNAME: db2hadev
EDUID   : 16814                EDUNAME: db2agent (PERF)
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 2205 bytes
 
Parallelism       = 19
Number of buffers = 19
Buffer size       = 5246976 (1281 4kB pages)
                                                                 
BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---       --------  --------  --------      --------      --------     --------
000    806.36      0.26      0.04    806.05           4         0
001    805.81     75.13    727.40      0.62        3366     16830
002    805.81     69.72    722.34     11.37        3015     15074
003    805.81     53.14    686.56     64.31        2235     11171
004    805.81     45.70    630.90    127.67        1902      9507
005    805.81     48.07    654.78    101.47        1863      9311
006    805.81     45.90    654.31    104.14        1796      8979
007    805.81     45.55    605.23    153.63        1705      8521
008    805.81     40.52    486.75    277.29        1488      7435
009    805.81     33.03    301.06    470.84        1048      5236
010    805.81     28.26    222.79    554.00         882      4406
011    805.81     20.67    157.57    627.03         598      2985
012    805.81     17.59    127.94    659.82         489      2440
013    805.81     12.35     94.41    698.75         329      1643
014    805.81     12.50     85.82    707.20         306      1529
015    805.81     10.74     87.29    707.50         291      1450
016    805.81      4.42     27.02    774.28          94       469
017    805.81      1.63      9.11    795.03          27       134
018    805.81      0.40      0.08    805.32           1         0
---       --------       --------     --------     --------       --------      --------
TOT  15311.08    565.66   6281.50   8446.40       21439    107125
 
MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---      --------    --------   --------     --------       --------    --------
000    806.39    805.45      0.16      0.00        5062     25319
001    806.36    805.09      0.08      0.33        6522     32630
002    806.32    804.92      0.06      0.31        4972     24874
003    806.34    804.74      0.06      0.30        4887     24449
---       --------      --------    --------    --------    --------     --------
TOT   3225.43   3220.21      0.36      0.95       21443    107273
 
Increasing the number of TSM sessions to 4 actually had a negative effect, with an elapsed time of 13 minutes and 27 seconds.  Note that the DB2 autonomics did not alter the parallelism, number of buffers or buffer size when the number of TSM sessions were increased. However, it did decrease the average time that BMs spent waiting for buffers. When we put these numbers in a spreadsheet, we get:

DM4a.jpg

DM4b.jpg

DM4c.jpg

 

We can see that the average time BMs spent waiting for buffers decreased, and the MCs continued to spend all of their time writing.  If increasing the number of sessions does not help, then perhaps we need to increase the size of the buffers.

What effect does increasing the util_heap_sz have?

By altering the database configuration to increase the util_heap_sz from 30000 to 60000 the following results were achieved:

DM5a.jpg 

DM5b.jpg

The util_heap_sz configuration parameter will affect both the number of buffers as well as the buffer size.  By doubling util_heap_sz we obtained the following results:

          Using 2 TSM sessions completed in 15 minutes and 8 seconds

          Using 4 TSM sessions completed in 8 minutes and 2 seconds

In both cases the new buffer size chosen was 10489856 bytes or 2561 4K pages.

Conclusion

The DB2 autonomics engine does an outstanding job selecting the # of buffers, buffer size and parallelism (# of BM threads), however there are factors that you can tune to greatly impact the performance of the utility. The two most important ones were discussed in this paper and the following one is just as important but can’t be controlled by tuning parameters.

Although not discussed in this paper, one of the most influential factors is the distribution of the data.  Each BM thread is assigned an entire tablespace; this single threaded architecture has had and continues to have, the largest impact on backup performance.  Ideally the data in the database would be evenly distributed across all of the tablespaces so that all of the BM threads would be kept equally busy. To improve backup performance, utilize the ADMIN_MOVE_TABLE procedure to distribute the data more evenly across all of the table spaces. Ideally all of the BM threads should complete at approximately the same time, whereas with uneven distributed data a few threads will continue to run while the remaining threads remain idle. The DB2 autonomic engine will examine the number of CPUs on the system and number of tablespaces defined in the data to set the parallelism setting appropriately.

 

The second most influential factor is the size of the utility heap (util_heap_sz).  By default, an online backup will consume no more that 50% of the available space in the util_heap_sz, whereas with an offline backup will consume up to 95% of the available util_heap_sz. This value will influence both the number of buffers as well as the size of each buffer allocated. Unless the DB2 server is memory constrained, an util_heap_sz of a minimum of 50,000 should be used. 

 

The third influential factor is the number of target sessions or devices used.  DB2 is unable to determine if the target device for a backup is a sequential or random access device, consequently DB2 will not autonomically compute an optimal number of devices to use.  The utilities rely on the caller to specify a “reasonable” value.  And, as was demonstrated in the previous examples, increasing the number of target devices will not always result in a lower overall elapsed time. Experimentation is the only method available to determine the optimal number of devices to use.  If backing up to a SAN there can be benefit from having DB2 write to multiple files on the same file system.  This is achieved by specifying the same target location multiple times, e.g.

 

            backup db perf to /db2/backup, /db2/backup, /db2/backup

 

This command will result in 3 backup objects being created in the /db2/backup file system, similar to adding multiple sessions to TSM as we previously demonstrated in this paper.

 

 

0 comments
30 views