IDUG Content Committee

 View Only

Db2 Columnar Database Page-Based String Compression

By Joe Geller posted Mar 28, 2023 12:59 PM


Db2 Columnar Database Page-Based String Compression

Written by Ron Liu

For many customers, string data dominates storage cost. Frequency-based compression is not effective for high cardinality datasets.  So percentage of values encoded for string data might be low, i.e., < 10%. The following is an example of the common string data that your database deals with.

Product Description VARCHAR

New Blue dress with unicorns girls size 6X

3 Red dress with hearts girls size 6X

1 Red dress with bears girls size 6X

Many Blue uniform shirt boys size 5

Old Red uniform shirt boys size 5


As shown in the table above, although there are repeating words in the column data, each value in the column is distinct hence full string compression can’t apply. Also the values don’t share the same prefix. Hence prefix encoding can’t apply either. So encoding using column dictionary will not have good coverage for these data.

Some string numeric and hex data that is also challenging for encoding using the column dictionary. Some examples of these type of data look like the following.


        HEX: 59721B038CB9AC5A5DD09055529A64CA4D000000

        Postal-codes: 95133-7670

        Telephone Numbers: 1-408-775-6978

        Date:12/21/2003 (or other date formats)

        Time:11:32:02 (or other time formats)

        CDR (Call Detail Record for telecommunications):00650068D34B41799911903603


These data don’t have repeating patterns. But they have no more than 16 distinct characters.

To address these limitations with encoding, Page-based string compression was introduced in Db2 v11.5.4. It uses two different schemes. Page-based string compression type 1 that uses LZ4 compression and type 2 that uses nibble compression (aka simplified Huffman). Supported data types include the following.





Page-based string compression type 1 works well with string data with repeating patterns that are of length at least 4 bytes long. Free flowing text as seen in comments, articles, manuals, descriptions, etc. normally work well with type 1.

Page-based string compression type 2 is a new simplified Huffman compressor that encodes each byte using a 4-bit code, which provides 2X compression. It is very effective for compressing high cardinality numeric, hex, date, time, and timestamp data stored in string data types. For type 2, data must have no more than 16 distinct characters per compressed data page. 

Db2 automatically determines which compression scheme to use to have the best possible compression.

Registry Variables

Db2 warehouse does not support fallback so new features are often enabled by default. The registry variable for page-based string compression is DB2_COL_STRING_COMPRESSION . But for Db2 customers, if you are on 11.5.7 or later and have no plan to fallback prior to 11.5.7, it is recommended that you set the following registry variables together for overall best performance and compression.  See the IDUG blog COLUMNAR DATA ENGINE (CDE) STORAGE NEW FEATURE.




Note, after these settings taking effect, all new data picks up the new compression algorithms. But the existing data stays as it.


Page-based string compression has insignificant impact on query performance. The CPU overhead for decompressing the data is offset by reduced IO and saving on bufferpool space.  Verified with the "Simple sales analytics workload" and "Complex sales analytics workload" benchmarks, performance difference with code-based string compression enabled vs. disabled is no more than 3%, within test result deviation. Minimal additional memory requirement for holding decompressed data. Decompression buffers are allocated per page and only exist while processing data compressed with the page-based string compression enabled.

With page-based string compression enabled, as proven in the lab, compression ratio for string data normally falls in the range of 2x to 8x. Some significant compression improvements were also observed in customers.


Storage Saving

Page-based String Compression Type

Additional Info

A Commercial Bank

26% (49.5 TB to 36.5 TB)

1 & 2

First IIAS customer upgraded to Db2 v11.5.4

A Commercial Bank

49% (24.3 TB to 13.4 TB)


Type 2 saving additional 16.5%, estimated with a subset of typical tables



A Government Tax Agency



A Countrywide Marketplace Owner

55% (1.7 TB to 800 GB)

1 & 2

Saving over other database vendor for large VARCHAR column

How to Verify if Page-based String Compression is in Effect

The following shell script can be used to verify if page-based string compression is in effect.

# change the schema and table name accordingly




# change the db name accordingly



# provide the output file name



$ db2 connect to $database


     Database Connection Information


 Database server             = DB2/LINUXX8664 11.5.8

 SQL authorization ID   = RONLIU

 Local database alias      = BLUDB


oi=`db2 -x "select TABLEID from syscat.tables where TABSCHEMA='$schema' and TABNAME='$table'"`

tsi=`db2 -x "select TBSPACEID from syscat.tables where TABSCHEMA='$schema' and TABNAME='$table'"`


$ time db2dart $database /DC CIO /OI $oi /TSI $tsi /PS 0 /NP 0 /V N /RPTN $rptn 2>&1 > /dev/null


real    0m2.34s

user    0m0.52s

sys     0m0.75s


# check command execution is successful, it will generate a report, showing the number of String Compressed pages.


$ grep "Number of Pages String Compressed" $rptn

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 74

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 0

         Number of Pages String Compressed = 832


# if it complaints object id not found, it is because the table doesn't reside on the current node. You can specify another node to connect to.

# another node for db2dart to connect to. For example


#        export DB2NODE=2


# then rerun the db2dart command


As observed in lab testing and reported by customers, page-base string compression drastically improves storage savings on string data while not impacting and in some cases improving performance. The feature should be turned on in Db2 for best compression.