DB2 LUW and the INGEST utility

Tom Glaser

DB2 LUW and the INGEST utility

Because of HADR reasons, and following IBM’s suggestion of using the INGEST utility rather than the load utility, I’m looking for help with the following questions:

  • An example of a fully functional INGEST syntax that we can invoke thru a shell script in the native Linux environment. Would need to have the example work with a positional file for a table with Null-able columns. We are not sure how the INGEST works for tables with NULL columns and the associated positional load files.
  • Any resources that we can add to the application server or the DB server to increase the throughput for the INGEST load jobs… given that during the schedules we could have jobs running in parallel doing load activities.
  • What is the highest volumes of data coming from mainframe and going into LUW…and what kind of performance numbers did you see?
  • Is there a 3rd party vendor tool that might work better than INGEST?

 Thanks, Tom

Janttu Lindroos

RE: DB2 LUW and the INGEST utility
(in response to Tom Glaser)

We got significant improvements to INGEST speed when tuned num_flushers_per_partition, num_formatters and shm_max_size parameters using INGEST SET command. Furthermore, Db2 version 11.1.2.2 increased INGEST speed significantly compared to 11.1.1.1. In our tests just fix pack upgrade more than doubled the ingesting speed.

 

INGEST has many advantages (or just differences depending on the use cases) to LOAD even though it’s slower method than LOAD. INGEST uses buffer pools so it speeds up the query performance when querying newly ingested data. LOAD does not use buffer pools so the first time you select the data it is always retrieved from the disk. INGEST is also (transaction) logging operation. Concurrency with INGEST is much better than with LOAD because INGEST uses row level locking. LOAD locks the whole table, so you can’t query data while loading unless you use ALLOW READ ACCESS that is deprecated in Db2 version Db2 10.1 FP1 and not allowed at all in BLU tables.

 

There are also some error handling differences between LOAD and INGEST. For example, if the value ’abc’ is tried to insert into the table’s DECIMAL column LOAD sets incorrect value to null and accepts the row as INGEST rejects the row.

 

Performance figures depend on many factors: the Db2 physical server itself, INGEST parameters (tuning effort is rewarding), numbers of rows in an ingested file(s) etc. We have been able to INGEST on average more than 105 000 rows per second in 94 GB memory 12 CPUs Linux server.

 

There are also couple of APARs related to INGEST and error logging to diagnostic logs (IT22464 and IT22148 where error should be info) but more importantly there is a bug that commit_count is not valid if num_flushers_per_partition is more than one (IT23359). In our case we got the best performance when set num_flushers_per_partition to match the number of logical CPUs in the database server. That means that Db2 commits more frequently than specified in commit_count ingest parameter.