Multi-row insert on DB2 for z/OS with code examples - by Emil Kotrc

Multi-row insert on DB2 for z/OS with code examples

Emil Kotrc, Emil.Kotrc@ca.com

Have you ever needed to insert many rows into a DB2 table from your application? What are your options in such scenario? Depending on the nature of your data and application:

●     you can just issue many INSERT statements in your program (either static or dynamic)
●     or you can INSERT via SELECT from other table or view
●     or you can use multi-row insert facility

Please note that there might be some other means that could be used in very specific case, like LOAD, clone tables, but let’s keep them away from this blog and focus on a single-row insert vs multi-row insert. 

What’s the main difference between single-row and multi-row inserts? Simply the performance. If you are inserting many rows in a loop via a single-row insert, you call DB2 for every single row, which means you spend some time for the communication with the DB2. On the other hand, if you use multi-row insert instead, you are passing more data within one call and thus reducing the DB2 calls. The reduction of the DB2 calls and traffic is the crucial difference between these two flavors of inserts.

IBM introduced the multi-row inserts for DB2 for z/OS in version 8 in 2004. If you open the performance topics redbook for DB2 8, you will find the statement saying that the multi-row operations can save up to 40% of CPU time compared to single-row insert. The number has probably changed since that time as there were even some performance improvements for single-row inserts, but the performance gain can still be significant these days.

As the multi-row insert may require an extra work in an application, one might tend to avoid it, but multi-row insert will pay off. Let’s look at the options for multi-row inserts from the simplest case using a static SQL in C, to a dynamic SQL in C, up to a dynamic SQL in Java.

Data

As usual, we need some data to play with. We don’t need anything big here, so how to prepare some super simple testing data? The answer is pretty simple, let’s generate some random data. Let our table have just two columns - an integer and a character string. DDL for our table is (assuming implicit tablespace and database):

CREATE TABLE MROWS (COL1 INTEGER NOT NULL, COL2 CHAR(8) NOT NULL)

Multi-row insert needs the input in the form of a host variable array for each column. To generate random data for each column, we will use the means provided by the standard C and Java libraries. In C we will use function rand() to generate an integer and to generate a random string we will use the function random_str() provided in a listing below. Instead of reading the data from a file, we will just generate them on the fly and populate the variable arrays.

Static SQL

The simplest case is to use multi-row insert in a static SQL. This can be useful for applications that work with the tables of known structure and that work with arrays of host variables. In this case you just need host variable array for each column, fill each array with n values (one per row), and call INSERT FOR n ROWS, where n can be an integer constant or a host variable representing the number of rows you want to insert.

For completeness, I provide full listing of the C source code, the description follows below.

#include <stdio.h>
#include <stdlib.h>                                           
#include <string.h>                                           

EXEC SQL INCLUDE SQLCA;                                       
 

/* generate a random string of num characters */

char* random_str(char* out, int num)
{                                                             
  const char chars[] = "abcdefghijklmnopqrstuvwxyz";          
  int i = 0;                                                  
  for (i = 0; i < num; i++) {                                 
    out[i] = chars[rand()%strlen(chars)];                     
  }                                                           
  out[i] = '\0';                                              
  return out;                                                  
}                                                             

/* check the SQLCODE and print a message if an error occurs */

int check_sqlcode()                                        
{                                                           
  if (SQLCODE < 0) {                                       
    EXEC SQL BEGIN DECLARE SECTION;                        
    char msg[32673];                                       
    char msgid[11];                                         
    long sqlcode;                                          
    EXEC SQL END DECLARE SECTION;                          
    EXEC SQL GET DIAGNOSTICS CONDITION 1                   
      :msg   = MESSAGE_TEXT,                               
      :msgid = DB2_MESSAGE_ID,                             
      :sqlcode = DB2_RETURNED_SQLCODE;                      

    printf("SQLCODE %d: %s %s\n", sqlcode, msgid, msg);    

    return 8;                                              
  } else
  { 
    return 0;         }                                                         

}                                                          

int main(int argc, char **argv)                            
{                                                           
  EXEC SQL BEGIN DECLARE SECTION;                          
  short nrows;                                             
  long int hva1[10];          /* data for 1st column */                                       
  char     hva2[10][9];      /* data for 2nd column */                              
  EXEC SQL END DECLARE SECTION;                             

  nrows = 10; /* let nrows be 10 in our case */                                            

  /* initialize host variable arrays */                                                           
  for (int i = 0; i < nrows; i++) {                        
    hva1[i] = rand();                                      
    random_str(hva2[i], 8);                       
  }                                                

  /* insert the rows */                                                  
  EXEC SQL                                        
    INSERT INTO MROWS VALUES (:hva1, :hva2)
    FOR :nrows ROWS;                              
  return check_sqlcode();                                                                 
}                                                  

The source code contains two helper functions: random_str() for generating random string; and check_sqlcode() for checking the SQLCODE (note the use of GET DIAGNOSTICS). The main part is however implemented in the main() function. In the DECLARE SECTION, three host variables are defined:

●     nrows - designating the number of rows passed in a single INSERT. In this case, we use 10 rows.
●     hva1 - is a 10 integer array for data of the first column.
●     hva2 - is an array of 10 character string for the second column. Remember we are in C and as we will use NULL-terminated strings, the strings will be 9 bytes for 8 bytes column. 

The for loop then initializes the host variable arrays with random values. And finally we are approaching the SQL statement. The INSERT statement contains the FOR :nrows ROWS clause declaring the multi row insert. The VALUES clause must use the host variable arrays, with dimensions equal to nrows. Please note, that if an application will always submit the constant number of rows, we could use FOR n ROWS clause, where n is an integer constant not a host variable. Other comment is that since we are allocating the fixed arrays, we could only decrease the number of nrows, we cannot increase it since we simply don’t have that big host array.

You’ve probably noticed that the application does not connect to DB2 explicitly, I am using the TSO attachment to keep it simple.

Diagnostics information

You may also wonder what will happen if any of the rows will not be inserted for any reason (for example due to an unique or check constraint). You have two options -

  1. execute the multi-row insert as atomic using the ATOMIC clause in the insert;
  2. or execute it as NOT ATOMIC using the same named clause.

The difference between ATOMIC and NOT ATOMIC is that the atomic behaves as “all or nothing”; i.e. either all rows are successfully inserted, or no row is inserted if the insertion of any row fails; this is the default behavior. The NOT ATOMIC treats every row separate, so it will continue inserting the rows even if any of them fails. So keep these two options in mind, when you are designing your application. (Please note that NOT ATOMIC is allowed only for static SQL, if you want to use it in dynamic SQL, you would need to provide it to the PREPARE statement.)

Now, how would you check the execution of multi-row insert? Good old friend SQLCA will give you some answers - check SQLCODE and SQLERRD3 fields. SQLCODE gives you the summary of the the processing:

●     SQLCODE +252. All rows were inserted, but one or more warnings occured.
●     SQLCODE -253. At least one row was successfully inserted, but one or more errors occurred. Some warnings might also have occurred.
●     SQLCODE -254. No row was inserted. One or more errors occurred while trying to insert multiple rows of data.
●     SQLCODE -20252. More errors occurred that DB2 is capable of recording. Statement processing is terminated.
●     SQLERRD3 indicates how many rows were inserted, so you can easily check whether all of your rows were inserted. 

If you used a non-atomic insert and some of the rows failed, you need to use GET DIAGNOSTICS SQL statement to check the diagnostics for each individual row.

Dynamic SQL with PREPARE and EXECUTE

The previous case was fine, but what if you wanted to insert the rows into a table provided by the user, but still with the agreed structure? The table name can just be dynamic in this case, but the structure is static.

We need a dynamic SQL in this example. As we are using host variables, we need a PREPARE and EXECUTE SQL statements; EXECUTE IMMEDIATE cannot be used in this case. Also, if we wanted to perform inserts repeatedly, we would do prepare once and then execute EXECUTEs multiple times.

The full listing follows, except the helper functions random_str() and check_sqlcode() as they are the same as in the previous example:

#include <stdio.h>                                             
#include <stdlib.h>                                            
#include <string.h>                                            

EXEC SQL INCLUDE SQLCA;                                          

int main(int argc, char **argv)                               
{                                                             
  EXEC SQL BEGIN DECLARE SECTION;                             
  short nrows;  /* number of rows to insert */                                                 
  struct {short len; char sqlstmt[256];} sqlstmt; /* SQL text for PREPARE*/             
  char attrvar[257];  /* multi row attribute for PREPARE */                                          
  long int hva1[10];  /* data for 1st column */                                         
  char     hva2[10][9]; /* data for 2nd column */                                       

  EXEC SQL END DECLARE SECTION;                               

  int rc = 0;                                                  
  nrows = 10;                                                                                                                      

  /* initialize host variables */
  for (int i = 0; i < nrows; i++) {                                 
    hva1[i] = rand();                                               
    random_str(hva2[i], 8);                                         
 }                                                                  

  /* prepare the INSERT text */                                                                   
  strcpy(sqlstmt.sqlstmt, "INSERT INTO MROWS VALUES (?, ?)");
  sqlstmt.len = strlen(sqlstmt.sqlstmt);                             

  /* prepare the multi-row attribute for PREPARE */

  strcpy(attrvar, "FOR MULTIPLE ROWS ATOMIC");                       

  /* PREPARE the multi-row INSERT */                                                                 
  EXEC SQL PREPARE my_insert ATTRIBUTES :attrvar FROM :sqlstmt;     
  rc += check_sqlcode();                                               

  /* EXECUTE the multi-row INSERT */                                                                    
  EXEC SQL EXECUTE my_insert USING :hva1, :hva2                     
    FOR :nrows ROWS;                                                 
  rc += check_sqlcode();                                              

  return rc;                                                        

}    

What has changed? The DECLARE section contains the same host variables as before, and adds two more:

●     sqlstmt - which is a structure-type VARCHAR in C, which will be populated with the INSERT SQL statement in the form “INSERT INTO MROWS VALUES (?, ?)”, the question marks identify the parameter markers. Here we can provide the table name from the user.

●     attrvar - is a NULL-terminated string populated with the FOR MULTIPLE ROWS ATOMIC clause and is used in the PREPARE statement.

The PREPARE statement then prepares the insert statement from the host variable sqlstmt using the attributes in attrvar. This is important as it declares the prepared statement for multi-rows. The prepared statement is named as my_insert.

The EXECUTE statement executes the prepared my_insert with the host variable arrays denoted in the USING clause. We are done with simple cases.

Dynamic SQL with SQLDA

Now, let’s generalize our test case a little bit more. What if the user provides the data, the table name, but we don’t know the table structure in advance? Then we must build the dynamic SQLDA - SQL descriptor area - describing the columns of the table and the corresponding input host variables for the INSERT statement. This use case can be useful for applications that act as generic loaders utilizing SQL. The data must be supplied together with the table description in order to be able to compose SQLDA; or DESCRIBE TABLE statement must be used to set SQLDA fields providing the information about the columns of a table.

In our case we keep it simple. Assume we get the table structure from the user (let it be the MROWS table with the two columns), and let’s build SQLDA in our code. The example of how we could perform the task follows (random_str() and check_sqlcode() functions are cut off, but are the same as in the first example):

#include <stdio.h>                                              
#include <stdlib.h>                                             
#include <string.h>                                              

EXEC SQL INCLUDE SQLCA;                                         
EXEC SQL INCLUDE SQLDA;                                         

int main(int argc, char **argv)                                         
{                                                                      
  EXEC SQL BEGIN DECLARE SECTION;                                      
  short nrows; /* number of rows in one multi-row insert */                                                          
  struct {short len; char sqlstmt[256];} sqlstmt; /* SQL text for PREPARE*/                      
  char attrvar[257]; /* attribute var for PREPARE */                                                   
  EXEC SQL END DECLARE SECTION;
  short ncols = 3; /* two columns in table + 1 for nrows host variable */                                       
  int rc = 0;                                                           

  /* allocate host variable arrays */                                                                      
  nrows = 10;                                                          
  int *hva1 = malloc(nrows*sizeof(int));                               
  char (*hva2)[9] = malloc(nrows*9);                                   

  /* allocate sqlda */                                                                
  struct sqlda *sqldaptr = calloc(SQLDASIZE(ncols));                 
  struct sqlvar *varptr = NULL;                                             
  if (!sqldaptr || !hva1 || !hva2)                                                     
    return 12;                                                         

  /* initialize SQLDA */                                               
  strcpy(sqldaptr->sqldaid,"SQLDA");                                 
  sqldaptr->sqldabc = SQLDASIZE(ncols);                              
  sqldaptr->sqln = ncols;                                             
  sqldaptr->sqld = ncols;                                            

  /* initialize first host variable */                                              
  varptr = &(sqldaptr->sqlvar[0]);                                    
  varptr->sqltype = 496; /* int */                                   
  varptr->sqllen = 4;    /* hv length */                                               
  varptr->sqldata = (char *) hva1;                                   
  varptr->sqlname.length = 8;                                        
  memcpy(varptr->sqlname.data,                                       
         "\x00\x00\x00\x00\x00\x01\x00\x00",                         
         varptr->sqlname.length);                                    
  memcpy(&varptr->sqlname.data[6], &nrows, sizeof(nrows));            

  /* initialize second host variable */                               
  varptr = &(sqldaptr->sqlvar[1]);                                   
  varptr->sqltype = 460; /* null-terminated varying char */          
  varptr->sqllen = 9;    /* hv length */                                                
  varptr->sqldata = (char *) hva2;                                   
  varptr->sqlname.length = 8;                                        
  memcpy(varptr->sqlname.data,                                       
         "\x00\x00\x00\x00\x00\x01\x00\x00",                          
         varptr->sqlname.length);                                    
  memcpy(&varptr->sqlname.data[6], &nrows, sizeof(nrows));            

 /* initialize a special host variable that represents the value
     for 'n' in a multiple-row INSERT statement.*/
  varptr = &(sqldaptr->sqlvar[2]);                                   
  varptr->sqltype = 500;  /* short int */                                             
  varptr->sqllen = 2;                                                
  varptr->sqldata = (char *) &nrows;                                 
  varptr->sqlname.length = 8;                                        
  memcpy(varptr->sqlname.data,                                        
         "\x00\x00\x00\x00\x00\x02\x00\x00", /* n in multiple rows */
         varptr->sqlname.length);                                    
 
  /* initialize data */                                                                      
  for (int i = 0; i < nrows; i++) {                                  
    hva1[i] = rand();                                                
    random_str(hva2[i], 8);                                          
  }                                                                   

  /* prepare the INSERT text */                                                                  
  strcpy(sqlstmt.sqlstmt, "INSERT INTO MROWS VALUES (?, ?)");
  sqlstmt.len = strlen(sqlstmt.sqlstmt);                             
 
  /* prepare the multi-row attribute for PREPARE */
  strcpy(attrvar, "FOR MULTIPLE ROWS ATOMIC");                       
 
  /* PREPARE the statement */
  EXEC SQL PREPARE my_insert ATTRIBUTES :attrvar FROM :sqlstmt;      
  rc += check_sqlcode();                                              
 
  /* EXECUTE the multi-row INSERT */
  EXEC SQL EXECUTE my_insert USING DESCRIPTOR :*sqldaptr;            
  rc += check_sqlcode();                                              
 
  /* free storage */
  free(hva1);                                                        
  free(hva2);
  free(sqldaptr);                                                       
   return rc;                                                         
}                                                                     

The main difference here is the setting up of SQL descriptor area (SQLDA); it deserves a detailed description:

SQLDA consists of two main parts:

  1. SQLDA header containing the eye catcher and number of SQLVAR entries (we have one entry for one host variable).
  2. SQLVAR entries describing the host variables. (please note that there are two types of SQLVAR entries - base and extended, we are using base only here. Extended are mainly for LOBs and distinct types) 

To allocate the required storage for SQLDA, we will need the number of columns (and therefore number of host variables) plus one extra host variable for ‘n’ in the multi-row INSERT. The ncols variable is therefore initialized to 3. The size of the whole SQLDA block (including SQLVAR entries) can be computed using the IBM provided macro SQLDASIZE.

Then we initialize the SQLDA header, and continue with setting the SQLVAR entries. We have three of them and in each we need to setup:

●     sqltype indicating the type of the host variable. IBM defines the possible sqltype values in the documentation, for example 496 indicates an INTEGER, 500 is a SMALLINT, 460 is a NULL-terminated type. Even number means there are no indicators, odd number means that the indicators are provided as well.

●     sqllen is the length of the host variable, integer is 4, smallint is 2

●     sqldata is the pointer to the host variable, in our case to a host variable array

●     sqlname has a bit specific usage. We set the length to 8, and EXECUTE then interprets fifth and sixth byte of sqlname as a flag indicating type of the host variable. In the first and second case it is a host variable array (flag 0x0001). In the last SQLVAR entry the flag is set to 0x0002 representing a special variable for ‘n’ in multi-row insert. The seventh and eighth bytes represent the dimension of the host variable array, if the previous two bytes indicate so (0x0001). These bytes are set to the array dimensions for the first and second SQLVAR entries.

Once we have all the SQLDA and SQLVAR set up, we continue with PREPARE and EXECUTE. The PREPARE statement is the same as in the previous example. The EXECUTE statement differs a bit as it does not use FOR n ROWS clause nor the USING, but uses USING DESCRIPTOR provided with the SQLDA in sqldaptr. Execution should insert all 10 rows.

Java

How would you implement multi-row insert in a pure JDBC Java application? The JDBC interface contains so called batched INSERT operation - the PreparedStatement class provides methods called addBatch() and executeBatch(). The first named method is used to add multiple rows to the batch, and the second named method executes the statement with multiple rows at once. But you also need to make sure the multi row feature is enabled (which should be by default), there is a Data Server driver Java property to handle this called enableMultiRowInsertSupport.

Let’s have a look at an example for our scenario:

package multirow;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Random;
 
public class Multirow {
 
    /* generate random string */
    private static String randomString(int num) {
     StringBuilder str = new StringBuilder();
     Random rnd = new Random();
        final String chars = "abcdefghijklmnopqrstuvwxyz";
        for (int i = 0; i < num; i++) {
            str.append(chars.charAt(rnd.nextInt(1000)%chars.length()));
        }
        return str.toString();
    }
 
    public static void main(String[] args) {
 
        String urlPrefix = "jdbc:db2:";
        String url;  /* connection url */
        Connection con; /* jdbc connection */
        int nrows = 10; /* number of rows in a batch */
        String user = user; /* TODO: provide user name */
        String password = password; /* TODO: provide password */
 
        /* prepare the url */
        url = urlPrefix + "//host:port/dbname"; /* TODO: correct the url */
        url = url + ":traceDirectory=" + System.getProperty("java.io.tmpdir")
                + ";traceFile=trace"
                + ";traceFileAppend=false"
                + ":traceLevel="
                + (com.ibm.db2.jcc.DB2BaseDataSource.TRACE_ALL) + ";";
        try {
            /* connect to DB2 */
            Class.forName("com.ibm.db2.jcc.DB2Driver"); // Load the driver
            // Create the connection using the IBM Data Server Driver for JDBC and SQLJ
            con = DriverManager.getConnection(url, user, password);
            con.setAutoCommit(false); // Commit changes manually
 
            String sql = "INSERT INTO MROWS VALUES (?, ?)";
            /* prepare the statement */
            PreparedStatement pstmt = con.prepareStatement(sql);
 
            /* generate the data */
            Random rnd = new Random();
            for (int i = 0; i < nrows; i++) {
                pstmt.setInt(1, rnd.nextInt(1000));
                pstmt.setString(2, randomString(8));
                pstmt.addBatch(); //add the data to batch
            }
 
            /* execute the batched insert */
            int[] count = pstmt.executeBatch();
            con.commit(); //commit the data
            con.close();  // Close the connection
        } catch (ClassNotFoundException e) {
            System.err.println("Could not load JDBC driver");
            System.out.println("Exception: " + e);
            e.printStackTrace();
        } catch (SQLException e) {
            System.err.println("SQLException information");
            while (e != null) {
                System.err.println("Error msg: " + e.getMessage());
                System.err.println("SQLSTATE: " + e.getSQLState());
                System.err.println("Error code: " + e.getErrorCode());
                e.printStackTrace();
                 // For drivers that support chained exceptions     
                e = e.getNextException();
            }
        }
    }
}

Beside the standard Java overhead and the similar code for generating the random data, the main things to notice are:

●     I usually set autocommit to false to manage the transactions myself
●     addbatch() adds the row for the batch processing, we just do one batch here
●     executebatch() executes the batched insert. The function returns an array indicating the update count for each individual row
●     the code also explicitly turns on the tracing so that you can check the batched insert has been converted to a multi-row insert.
●     finally clause is usually recommended for connection closing as opposite in the main code as in our example 

Last comment is about the atomicity and non atomicity of batch insert in the java environment. To manage this, there is a driver property called atomicMultiRowInsert, the default being non-atomic. This concludes the examples I wanted to cover, let’s see some outputs.

Experiments

I’ve performed few simple experiments to see the impact and benefits of multi-row inserts. I’ve decided to run the tests using Java from my workstation and connecting to DB2 for z/OS running on the mainframe. This experiment should reveal the performance gain caused by saving the Application (class 1) and DB2 elapsed time (class 2) due to reduction of DB2 calls from my workstation.

The source code is basically the same as in the Java example except I was running multiple batches instead just one; for single-row insert I am using execute() method of the PreparedStatement class for each insert separately. I’ve measured elapsed time and DB2 time using an unnamed DB2 monitor product. I’ve inserted 10000 records; it seems to be very small set, but due to the distributed access, the single row insert was pretty time consuming. Times are displayed in seconds and the numbers are averaged from multiple runs. The tests were executed on a DB2 version 10 subsystem.

chart1.jpg

chart2.jpg

What you can see is an enormous elapsed time when running in single-row insert mode. The vast majority of time is spent in communication - for each single row, we need to perform the overhead of sending it to DB2 individually. Using multi-row insert we definitely spare the traffic time, even if we just pass only 10 rows in a batch.

Now, the question could be, what’s the best row count for a multi-row insert? Well, as usual, it depends. First, there are some limits I am aware of:

  1. you can use multi-row insert for maximum of 32767 rows
  2. there is a 10M limit for the user data and control information for a single multi-row insert executed via DRDA

Second, the optimal value may depend on your application, the data you pass (number of columns), table space types, existing indexes, etc. Usually, 100 or 10 rows are treated as a good start.

Conclusion and final note

Multi-row inserts (and multi-row operations in general) are definitely a good practice to consider. If your application is going to perform multiple inserts, I would definitely recommend looking at this feature. The next step might require some experimenting and testing to tune for optimal performance, but I would be very surprised if any multi-row insert would perform significantly slower than a set of single row inserts in a loop.

At the end, I would also link to a recent discussion http://www.idug.org/p/fo/et/thread=44439 about multi-row insert not issued from application programs. It appears that multi-row insert for DB2 for z/OS is limited compared to DB2 LUW. While the VALUES clause in DB2 LUW says “Introduces one or more rows of values to be inserted.”, the DB2 for z/OS flavor of VALUES limits the usage to one row only “Specifies one new row in the form of a list of values.”. In other words, if you need to exploit the multi-row insert facility in DB2 for z/OS, your only choice at this moment is a specific application for that purpose. Hopefully, this blog will help you a bit in that matter.

References

●     Dynamically executing a data change statement

●     DB2 UDB for z/OS Version 8 Technical Preview

●     DB2 UDB for z/OS Version 8: Everything You Ever Wanted to Know, ... and More

●     DB2 UDB for z/OS Version 8 Performance Topics

●     INSERT

●     PREPARE

●     EXECUTE

●     Making batch updates in JDBC applications

●     SQLDA and SQLVAR

●     Common IBM Data Server Driver for JDBC and SQLJ properties for all supported database products

 

2 Likes
Recent Stories
Introduction to Db2 for z/OS System Profiles

10 questions to review and understand your active log data set configuration

DB2 for z/OS query support over FTP