Actual Format of Compressed data rows on Db2 for z/OS

Michael Hannan

Actual Format of Compressed data rows on Db2 for z/OS

I have seen presentations and articles concerning re-ordered Row Format and what part of data rows gets logged. I have read about logging from first to last changed byte.

In the real world most of our large tables are going to be compressed, so descriptions of the row format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of the row in terms of columns and their offsets for compressed is not described anywhere that I could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel. But that does little to help understand the actual format of the row. Are the columns re-ordered? Is there a bit string to indicate which columns are compressed? etc.

Perhaps this topic has been well covered before but I have not seen any good description of what a compressed row looks like?

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

 

James Campbell

Actual Format of Compressed data rows on Db2 for z/OS
(in response to Michael Hannan)
Did you read DB2 9 Performance Topics section 4.14
http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf

As far as I have ever known, either an entire row is compressed or none of it is. There is a
flag in the row header indicating which.

James Campbell


On 18 Dec 2017 at 23:09, Michael Hannan wrote:

>
> I have seen presentations and articles concerning re-ordered Row Format and what part of data
> rows gets logged. I have read about logging from first to last changed byte.
> In the real world most of our large tables are going to be compressed, so descriptions of the row
> format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of
> the row in terms of columns and their offsets for compressed is not described anywhere that I
> could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel.
> But that does little to help understand the actual format of the row. Are the columns re-ordered?
> Is there a bit string to indicate which columns are compressed? etc.
> Perhaps this topic has been well covered before but I have not seen any good description of
> what a compressed row looks like?
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>  

Michael Hannan

RE: Actual Format of Compressed data rows on Db2 for z/OS
(in response to James Campbell)

James,

I imagine that some columns in a row do not compress well at all, and maybe left in an uncompressed form or close to that. I doubt that a dictionary lookup entries would be wasted for short columns with high cardinality and non skewed distributions, but I have not studied Ziv Lempel in depth to know exactly how it works and assume IBM have modified it to suit their own needs. I am sure that long CHAR columns compress well due to trailing blanks, and some very short columns compress very poorly. Maybe compression works better if a whole lot of numeric columns of mostly zero  value are next to each other in the row (just wild speculation). 

It is possible that a compressed row is a string of 12 bit symbols that each get translated into either single character or a string of multiple chars, and that with RRF it is possible that the row only needs to get decompressed up to the point where the required column starts or the one after the desired column. It is possible that Varying length column offsets (each 2 bytes) for RRF are not compressed at all and still point to the start of a varying length column. This paragraph was all speculation. However I did read somewhere that Db2 11 gives us partial row decompression, only for RRF, and that columns needed for predicates (not indexed) might be decompressed first, and remainder needed by the SQL maybe decompressed later. The detail was a bit scant. However would make sense not to decompress some very large VARCHAR columns selected, if the row did not qualify.

Well yes, I always read the Performance Topics Redbooks, and more than one time, for any interesting topics. You will note section 4.14 has all these fancy diagrams showing the row format, which all seems completely useless for a compressed row which is not covered in detail diagrams at all in terms of what it looks like. Possibly it is too complex for mere mortals to understand.

So how does Db2 find the starting position of any column in a compressed row? Do any compressed columns or strings of columns have a fixed offset inside the row? Does Db2 have to decompress the full row before it can look up any columns in the row? Is the hardware assist and compression dictionary so efficient that full row decompression can happen in a very short time? Does RRF mean that only part of the row need be decompressed in order to find a column at a given offset, in the non compressed row?

I hate things that work by mysterious means since it makes it harder to understand the performance aspects. In my early days of Db2 I heard what it costs to do a Getpage, but only later worked out what a Getpage meant, and roughly what things had to be done, that explained the significant cost, most importantly when Db2 needs to do a Getpage, and when it does not need to.
 
In Reply to James Campbell:

Did you read DB2 9 Performance Topics section 4.14
http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf

As far as I have ever known, either an entire row is compressed or none of it is. There is a
flag in the row header indicating which.

James Campbell


On 18 Dec 2017 at 23:09, Michael Hannan wrote:

>
> I have seen presentations and articles concerning re-ordered Row Format and what part of data
> rows gets logged. I have read about logging from first to last changed byte.
> In the real world most of our large tables are going to be compressed, so descriptions of the row
> format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of
> the row in terms of columns and their offsets for compressed is not described anywhere that I
> could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel.
> But that does little to help understand the actual format of the row. Are the columns re-ordered?
> Is there a bit string to indicate which columns are compressed? etc.
> Perhaps this topic has been well covered before but I have not seen any good description of
> what a compressed row looks like?
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>  

 So far I never found a performance need for Index Compression, but that is another topic.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 21, 2017 - 08:03 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 21, 2017 - 08:21 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 21, 2017 - 08:23 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 21, 2017 - 08:24 AM (Europe/Berlin)

Peter Vanroose

Re: Actual Format of Compressed data rows on Db2 for z/OS
(in response to Michael Hannan)

In Reply to Michael Hannan:

I imagine that some columns in a row do not compress well at all, and maybe left in an uncompressed form or close to that. [...] I hate things that work by mysterious means since it makes it harder to understand the performance aspects. [...]

Michael,

LZ compression is actually relatively simple to understand:
First of all, a row is indeed either completely compressed, or not at all.
And the pages in the bufferpool are still in compressed state, so the data manager does the (de)compression.
Hence, it will only decompress the rows it needs, not the full page.
Lempel-Ziv (de)compression is not by column: a sequence of bytes is taken, and transformed based on a "code book" (= 2-column table of byte sequences: compressed & uncompressed). LZ is not aware of the meaning of the bytes (text, numbers, pointers, ...)

(De)compression is a sequential process, on a full row: bytes are processed from left to right, always starting from the beginning of the row. One cannot just decompress (say) column 2.

The good news, certainly with RRF, is that decompression can of course stop when the required data is found. Suppose you do not need any varchar data from the row, and just (say) columns 1 and 3. Then it suffices to decompress the bytes of columns 1, 2, and 3. And if column 2 is a varchar, you just need to decompress its two "pointer" bytes. This is one of the advantages of RRF.

On the other hand, with RRF, if you do need the data for varchar column 2, you cannot make use of that decompressed pointer value until you have decompressed up to that amount of bytes, i.e., somewhere past the end of column 3 and maybe further.

One last word: LZ compresses byte sequences, not characters. And not just repeated sequences of bytes, but any often occurring pattern. So it's not runlength encoding (RLE). And again: it's not column-based, so it's not limited by column boundaries.

Just a (stupid) example: suppose a table has two nullable integer columns (not consecutive), containing lots of values between 0 and 255. In that case it's very likely that at construction of the code book, the four bytes x'00000000', i.e., the NULL indicator plus the first three (binary) bytes of all those integers, end up as an entry in the left column of the code book, with a 1-byte compression "translation" as its second column field.

So it's possibly even an advantage that compression is across columns: two correlated columns (say: zip code & city name) may nicely compress together, provided they are (1) both varchar, or both not varchar, and (2) they are consecutive columns.

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

James Campbell

Actual Format of Compressed data rows on Db2 for z/OS
(in response to Michael Hannan)
Ah, I think I see what is driving your questions. The secret is that there is no secret. An
RRF row is compressed in the same way that a BRF row is - just a string of bytes
compressed using the compression dictionary. So the position of any column in the
compressed data is variable - and might not be on a byte boundary.

Somewhere I recall reading that Db2 might decompress only the initial part of a row - if it
didn't need the remainder. But it was obscure (aka - we can change this as we think fit)
under what conditions it did this.

James Campbell

On 20 Dec 2017 at 23:43, Michael Hannan wrote:

>
> James,
> Well yes, I always read the Performance Topics Redbooks, and more than one time, for any
> interesting topics. You will note section 4.14 has all these fancy diagrams showing the row
> format, which all seems completely useless for a compressed row which is not covered in detail
> diagrams at all in terms of what it looks like. Possibly it is too complex for mere mortals to
> understand.
> So how does Db2 find the starting position of any column in a compressed row? Do
> any compressed columns or strings of columns have a fixed offset inside the row? Does
> Db2 have to decompress the full row before it can look up any columns in the row? Is the
> hardware assist and compression dictionary so efficient that full row decompression can happen
> in a very short time? Does RRF mean that only part of the row need be decompressed in order to
> find a column at a given offset, in the non compressed row?
> I hate things that work by mysterious means since it makes it harder to understand the
> performance aspects. In my early days of Db2 I heard what it costs to do a Getpage, but only
> later worked out what a Getpage meant, and roughly what things had to be done, that explained
> the significant cost, most importantly when Db2 needs to do a Getpage, and when it does not
> need to.
>  
> In Reply to James Campbell:
> Did you read DB2 9 Performance Topics section 4.14
> http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf
>
> As far as I have ever known, either an entire row is compressed or none of it is. There is a
> flag in the row header indicating which.
>
> James Campbell
>
>
> On 18 Dec 2017 at 23:09, Michael Hannan wrote:
>
> >
> > I have seen presentations and articles concerning re-ordered Row Format and what part
> of data
> > rows gets logged. I have read about logging from first to last changed byte.
> > In the real world most of our large tables are going to be compressed, so descriptions of
> the row
> > format for RRF using non compressed rows is somewhat uninteresting to me. Yet the
> format of
> > the row in terms of columns and their offsets for compressed is not described anywhere
> that I
> > could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and
> Ziv-Lempel.
> > But that does little to help understand the actual format of the row. Are the columns
> re-ordered?
> > Is there a bit string to indicate which columns are compressed? etc.
> > Perhaps this topic has been well covered before but I have not seen any good
> description of
> > what a compressed row looks like?
> > Michael Hannan,
> > DB2 Application Performance Specialist
> > CPT Global Ltd
> >  
>
>
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>
>

Michael Hannan

RE: Actual Format of Compressed data rows on Db2 for z/OS
(in response to James Campbell)

James,

There is "no secret" only if someone is happy with a vague description of how it works. One can search for Ziv-Lempel and get a description of workings not necessarily pertaining to Db2 z/OS. There are always variations to these algorithms.

Did you read Db2 11 performance Topics "2.3 CPU reductions with selective decompression"? LOL It was not quite enough detail to give me full understanding. Obviously IBM did not want to invest in having the code cope with partial decompression for BRF. 

It is conceivable to me, that IBM may have adjusted Ziv-Lempel to their own needs and not necessarily compressed 100% of the full row. IBM may use 12 bit symbols for Ziv-Lempel, or may not. I did not want to speculate too far on what might be sensible for performance though, with no evidence. We do know that the row header is not compressed.

For many years there were recommendations for what order to put our columns in a table definition. Since the ways Db2 work for logging, and partial decompression keep changing, advising on a column sequence seems a bit pointless. Columns frequently used in predicates could be put towards the top of the column list, but that would only be useful if they are fixed length columns and the column predicates are not always evaluated in the index used (column not in index). So I am not sure that that any recommendations are really at all worth while, especially without a full understanding of how compression and decompression work (not just a rough understanding that makes assumptions).


In Reply to James Campbell:

Ah, I think I see what is driving your questions. The secret is that there is no secret. An
RRF row is compressed in the same way that a BRF row is - just a string of bytes
compressed using the compression dictionary. So the position of any column in the
compressed data is variable - and might not be on a byte boundary.

Somewhere I recall reading that Db2 might decompress only the initial part of a row - if it
didn't need the remainder. But it was obscure (aka - we can change this as we think fit)
under what conditions it did this.

James Campbell

 

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 22, 2017 - 07:20 AM (Europe/Berlin)

bernd oppolzer

Actual Format of Compressed data rows on Db2 for z/OS
(in response to Michael Hannan)
MY knowledge on this topic may be a little bit out-dated, but I would
like to mention

- compression on z-Arch is done by hardware instruction, so compression
/ decompression
comes at little cost (compared to other platforms)

- even if compression is specified in the tablespace, rows are only
compressed, if they
get shorter during compression. If not, they are stored in the original
form (a bit in the
row header shows if compressed or not; this is controlled per row)

- column values in the index are not compressed, AFAIK. The values are
stored redundantly
in the index, to support indexonly scans etc., and the format may be
different from the
format in the rows in the tablespace, to support ascending / descending
and other things,
although I am not sure on this (only speculation). All the index
evaluation is IMO done
BEFORE decompression. decompression is done, when the values from the
rows are
really needed, that is: never on indexonly scans.

some statistics:

- on certain tablespaces, compression can reduce the size of the
tablespace to 50 %
of the original size, which in turn reduces the I/O time for tablespace
scans etc.

- only 2 to 5 % more CPU cost to do the uncompression.

(from some IBM classes)

HTH, kind regards

Bernd



Am 22.12.2017 um 07:16 schrieb Michael Hannan:
>
> James,
>
> There is "no secret" only if someone is happy with a vague description
> of how it works. One can search for Ziv-Lempel and get a description
> of workings not necessarily pertaining to Db2 z/OS. There are always
> variations to these algorithms.
>
> Did you read Db2 11 performance Topics "2.3 CPU reductions with
> selective decompression"? LOL It was not quite enough detail to give
> me full understanding. Obviously IBM did not want to invest in having
> the code cope with partial decompression for BRF.
>
> It is conceivable to me, that IBM may have adjusted Ziv-Lempel to
> their own needs and not necessarily compressed 100% of the full
> row. IBM may use 12 bit symbols for Ziv-Lempel, or may not. I did not
> want to speculate too far on what might be sensible for performance
> though, with no evidence. We do know that the row header is not
> compressed.
>
> For many years there were recommendations for what order to put our
> columns in a table definition. Since the ways Db2 work for logging,
> and partial decompression keep changing, advising on a column sequence
> seems a bit pointless. Columns frequently used in predicates could be
> put towards the top of the column list, but that would only be useful
> if they are fixed length columns and the predicates are not always
> evaluated in the index used. So I am sure that that any
> recommendations are really at all worth while, especially without a
> full understanding of how compression and decompression work (not just
> a rough understanding that makes assumptions).
>
>

Michael Hannan

RE: Actual Format of Compressed data rows on Db2 for z/OS
(in response to bernd oppolzer)

In Reply to bernd oppolzer:

- column values in the index are not compressed, AFAIK. The values are stored redundantly
in the index, to support indexonly scans etc., and the format may be different from the
format in the rows in the tablespace, to support ascending / descending
and other things, although I am not sure on this (only speculation). All the index
evaluation is IMO done BEFORE decompression. decompression is done, when the values from the
rows are really needed, that is: never on indexonly scans.

Bernd,

The format of Index Pages and Entries is fairly well described in the "Db2 for z/OS Diagnosis Guide and Reference" if you are able to get hold of one. I have only seen a copy of a quite old one, recently. It also describes Data Page and Row formats. It does not contain much information about what compressed data rows look like, other than Compression Dictionary entries seemed to be 8 Bytes, each (in the old manual).

So your speculation and opinions on indexes are correct.

So you are certainly right that Index Entry format is somewhat different to data row format. Yes Index Entries are not compressed, other than non-leaf Index entry keys can be truncated to just the leading portion needed to be strictly higher than highest key present in the child page. Index entries, can contain a RID count when not Unique, an Index Key, and one or more RIDs each preceded by a RID Flag, indicated things like Pseudo Deleted, Potentially Uncommitted, etc. The is also an array of pointers to the key entries, to enable keys on the page to be subjected to Binary Search. Db2 indexes are binary searched within a page for a key probe lookup, and tree searched on the broader level, or leaf pages scanned using forward and backward leaf page pointers for sequential scan access. Index Page level compression is not very interesting (to me), and as yet I have never seen a performance need for it. Maybe someone else has. One of the reasons to understand Index Entry format, is to be able to estimate the number of levels in a proposed index. It is possible to calculate this roughly.

A Curiosity, in old days before Db2 V4, we had Type 1 Indexes which had sub-pages so that a lock could be taken on just a sub-page. Type 2 Indexes then replaced Type 1 indexes, eliminating Index Logical Page Locks (Type 2 are Latched briefly), and hence no more sub-pages. Irritatingly, I still occasionally hear the misnomer "sub-page" processing used. Also Type 2 Indexes eliminated the need to search long chains of duplicate RIDs, by keeping them in sorted sequence, allowing a normal Index probe tree search for a RID to be removed when a  row is Deleted.

I don't quite understand why Type 2 Indexes do not truncate index prefixes when all entries on a page contain the same key prefix. Perhaps the overhead of update, when the hypothetical duplicate prefix needed to change, was too great. Perhaps Index Page file compression makes use of the fact that entry prefixes may well be repeated. However compressed Index pages are decompressed before arriving in the bufferpool, which seems to indicate they are targeted at disk storage space and I/O. They don't affect Getpage counts at all. I have seen Db2 Customers using larger Index page sizes (> 4K) in order to reduce the number of levels in the index (and reduce Getpage counts for scanning SQLs), mostly without using any Index page compression.

Is Index page compression a "solution looking for a problem"? Maybe someone found a requirement for it?

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 24, 2017 - 03:46 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Dec 24, 2017 - 04:11 AM (Europe/Berlin)

Wayne Driscoll

Actual Format of Compressed data rows on Db2 for z/OS
(in response to Michael Hannan)
If I recall correctly, Db2 decompresses the page as it gets read into the buffer pool, compression is only for data on disk (or in an image copy of course). The

Sent from my iPhone

On Dec 21, 2017, at 12:43 AM, Michael Hannan <[login to unmask email]<mailto:[login to unmask email]>> wrote:


James,

Well yes, I always read the Performance Topics Redbooks, and more than one time, for any interesting topics. You will note section 4.14 has all these fancy diagrams showing the row format, which all seems completely useless for a compressed row which is not covered in detail diagrams at all in terms of what it looks like. Possibly it is too complex for mere mortals to understand.

So how does Db2 find the starting position of any column in a compressed row? Do any compressed columns or strings of columns have a fixed offset inside the row? Does Db2 have to decompress the full row before it can look up any columns in the row? Is the hardware assist and compression dictionary so efficient that full row decompression can happen in a very short time? Does RRF mean that only part of the row need be decompressed in order to find a column at a given offset, in the non compressed row?

I hate things that work by mysterious means since it makes it harder to understand the performance aspects. In my early days of Db2 I heard what it costs to do a Getpage, but only later worked out what a Getpage meant, and roughly what things had to be done, that explained the significant cost, most importantly when Db2 needs to do a Getpage, and when it does not need to.

In Reply to James Campbell:

Did you read DB2 9 Performance Topics section 4.14
http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf

As far as I have ever known, either an entire row is compressed or none of it is. There is a
flag in the row header indicating which.

James Campbell


On 18 Dec 2017 at 23:09, Michael Hannan wrote:

>
> I have seen presentations and articles concerning re-ordered Row Format and what part of data
> rows gets logged. I have read about logging from first to last changed byte.
> In the real world most of our large tables are going to be compressed, so descriptions of the row
> format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of
> the row in terms of columns and their offsets for compressed is not described anywhere that I
> could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel.
> But that does little to help understand the actual format of the row. Are the columns re-ordered?
> Is there a bit string to indicate which columns are compressed? etc.
> Perhaps this topic has been well covered before but I have not seen any good description of
> what a compressed row looks like?
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>



Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

Wayne Driscoll

Actual Format of Compressed data rows on Db2 for z/OS
(in response to Wayne Driscoll)
Sorry, pressed send when I meant to edit. The page is read into the buffer pool in compressed format and the rows are uncompressed at first access. So compressed pages are read in, rows uncompressed as needed and unneeded rows don’t get decompressed. If however a row is updated, then all rows on the page may need to be uncompressed to build the new page. Since the row is uncompressed at access time, the offset logic, either BRF or RRF, can be used to locate column data.
Wayne Driscoll
Rocket Software
All opinions are mine alone

Sent from my iPhone

On Dec 28, 2017, at 8:38 AM, Wayne Driscoll <[login to unmask email]<mailto:[login to unmask email]>> wrote:

If I recall correctly, Db2 decompresses the page as it gets read into the buffer pool, compression is only for data on disk (or in an image copy of course). The

Sent from my iPhone

On Dec 21, 2017, at 12:43 AM, Michael Hannan <[login to unmask email]<mailto:[login to unmask email]>> wrote:


James,

Well yes, I always read the Performance Topics Redbooks, and more than one time, for any interesting topics. You will note section 4.14 has all these fancy diagrams showing the row format, which all seems completely useless for a compressed row which is not covered in detail diagrams at all in terms of what it looks like. Possibly it is too complex for mere mortals to understand.

So how does Db2 find the starting position of any column in a compressed row? Do any compressed columns or strings of columns have a fixed offset inside the row? Does Db2 have to decompress the full row before it can look up any columns in the row? Is the hardware assist and compression dictionary so efficient that full row decompression can happen in a very short time? Does RRF mean that only part of the row need be decompressed in order to find a column at a given offset, in the non compressed row?

I hate things that work by mysterious means since it makes it harder to understand the performance aspects. In my early days of Db2 I heard what it costs to do a Getpage, but only later worked out what a Getpage meant, and roughly what things had to be done, that explained the significant cost, most importantly when Db2 needs to do a Getpage, and when it does not need to.

In Reply to James Campbell:

Did you read DB2 9 Performance Topics section 4.14
http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf

As far as I have ever known, either an entire row is compressed or none of it is. There is a
flag in the row header indicating which.

James Campbell


On 18 Dec 2017 at 23:09, Michael Hannan wrote:

>
> I have seen presentations and articles concerning re-ordered Row Format and what part of data
> rows gets logged. I have read about logging from first to last changed byte.
> In the real world most of our large tables are going to be compressed, so descriptions of the row
> format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of
> the row in terms of columns and their offsets for compressed is not described anywhere that I
> could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel.
> But that does little to help understand the actual format of the row. Are the columns re-ordered?
> Is there a bit string to indicate which columns are compressed? etc.
> Perhaps this topic has been well covered before but I have not seen any good description of
> what a compressed row looks like?
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>



Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

Joe Geller

RE: Actual Format of Compressed data rows on Db2 for z/OS
(in response to Wayne Driscoll)

Wayne,

I think a couple of things you said are not quite correct (although I could be wrong).

1) "the rows are uncompressed at first access."   I would think the rows have to be uncompressed at each access (and the uncompressed row is in another area of memory to be processed).  If the row were only uncompressed at first access, the uncompressed row would have to be stored in the bufferpool - but where? The bufferpool page is the size of the tablespace page, so you can't expand one row in the bufferpool page.

2) "If however a row is updated, then all rows on the page may need to be uncompressed to build the new page."  I don't think that is necessary.  Updating a compressed row is in many ways similar to updating a VARCHAR column in a non-compressed table.  If the size of the row has stayed the same or decreased due to the update, the row (after re-compression) fits right back into the same spot on the page.  If it has grown and there in not enough room on the page, then it is moved to another page, with an anchor where it was (this is an Indirect reference NEARINDREF or FARINDREF depending on how far away the new page is).  If it has grown and there is enough room on the page, then Db2 has to move some of the rows around on the page to fit this one in.  But, none of these cases require uncompressing the other rows on the page.

Joe

In Reply to Wayne Driscoll:

Sorry, pressed send when I meant to edit. The page is read into the buffer pool in compressed format and the rows are uncompressed at first access. So compressed pages are read in, rows uncompressed as needed and unneeded rows don’t get decompressed. If however a row is updated, then all rows on the page may need to be uncompressed to build the new page. Since the row is uncompressed at access time, the offset logic, either BRF or RRF, can be used to locate column data.
Wayne Driscoll
Rocket Software
All opinions are mine alone

Sent from my iPhone

On Dec 28, 2017, at 8:38 AM, Wayne Driscoll <[login to unmask email]<mailto:[login to unmask email]>> wrote:

If I recall correctly, Db2 decompresses the page as it gets read into the buffer pool, compression is only for data on disk (or in an image copy of course). The

Sent from my iPhone

On Dec 21, 2017, at 12:43 AM, Michael Hannan <[login to unmask email]<mailto:[login to unmask email]>> wrote:


James,

Well yes, I always read the Performance Topics Redbooks, and more than one time, for any interesting topics. You will note section 4.14 has all these fancy diagrams showing the row format, which all seems completely useless for a compressed row which is not covered in detail diagrams at all in terms of what it looks like. Possibly it is too complex for mere mortals to understand.

So how does Db2 find the starting position of any column in a compressed row? Do any compressed columns or strings of columns have a fixed offset inside the row? Does Db2 have to decompress the full row before it can look up any columns in the row? Is the hardware assist and compression dictionary so efficient that full row decompression can happen in a very short time? Does RRF mean that only part of the row need be decompressed in order to find a column at a given offset, in the non compressed row?

I hate things that work by mysterious means since it makes it harder to understand the performance aspects. In my early days of Db2 I heard what it costs to do a Getpage, but only later worked out what a Getpage meant, and roughly what things had to be done, that explained the significant cost, most importantly when Db2 needs to do a Getpage, and when it does not need to.

In Reply to James Campbell:

Did you read DB2 9 Performance Topics section 4.14
http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf

As far as I have ever known, either an entire row is compressed or none of it is. There is a
flag in the row header indicating which.

James Campbell


On 18 Dec 2017 at 23:09, Michael Hannan wrote:

>
> I have seen presentations and articles concerning re-ordered Row Format and what part of data
> rows gets logged. I have read about logging from first to last changed byte.
> In the real world most of our large tables are going to be compressed, so descriptions of the row
> format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of
> the row in terms of columns and their offsets for compressed is not described anywhere that I
> could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel.
> But that does little to help understand the actual format of the row. Are the columns re-ordered?
> Is there a bit string to indicate which columns are compressed? etc.
> Perhaps this topic has been well covered before but I have not seen any good description of
> what a compressed row looks like?
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>



Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

Philip Sevetson

Actual Format of Compressed data rows on Db2 for z/OS
(in response to Joe Geller)
Wayne and Joe,
Joe’s correction agrees with what I recall of DB2’s access of compressed rows.

1) Rows are compressed, whole, by Lempel-Ziv token substitution. They are expanded for each access of the retrieved page.

2) An updated row would need to be re-compressed, but the other rows in the page are not changed by a given row being re-compressed.

a. Depending on the ending size of the updated row, the rows on the page may need to be rearranged or the re-compressed row may need to be moved to a new page.
--Phil Sevetson

From: Joe Geller [mailto:[login to unmask email]
Sent: Thursday, December 28, 2017 10:24 AM
To: [login to unmask email]
Subject: [DB2-L] - RE: Actual Format of Compressed data rows on Db2 for z/OS


Wayne,

I think a couple of things you said are not quite correct (although I could be wrong).

1) "the rows are uncompressed at first access." I would think the rows have to be uncompressed at each access (and the uncompressed row is in another area of memory to be processed). If the row were only uncompressed at first access, the uncompressed row would have to be stored in the bufferpool - but where? The bufferpool page is the size of the tablespace page, so you can't expand one row in the bufferpool page.

2) "If however a row is updated, then all rows on the page may need to be uncompressed to build the new page." I don't think that is necessary. Updating a compressed row is in many ways similar to updating a VARCHAR column in a non-compressed table. If the size of the row has stayed the same or decreased due to the update, the row (after re-compression) fits right back into the same spot on the page. If it has grown and there in not enough room on the page, then it is moved to another page, with an anchor where it was (this is an Indirect reference NEARINDREF or FARINDREF depending on how far away the new page is). If it has grown and there is enough room on the page, then Db2 has to move some of the rows around on the page to fit this one in. But, none of these cases require uncompressing the other rows on the page.

Joe

In Reply to Wayne Driscoll:
Sorry, pressed send when I meant to edit. The page is read into the buffer pool in compressed format and the rows are uncompressed at first access. So compressed pages are read in, rows uncompressed as needed and unneeded rows don’t get decompressed. If however a row is updated, then all rows on the page may need to be uncompressed to build the new page. Since the row is uncompressed at access time, the offset logic, either BRF or RRF, can be used to locate column data.
Wayne Driscoll
Rocket Software
All opinions are mine alone

Sent from my iPhone

On Dec 28, 2017, at 8:38 AM, Wayne Driscoll <[login to unmask email]<mailto:[login to unmask email]>><mailto:[login to unmask email]%3e%3e> wrote:

If I recall correctly, Db2 decompresses the page as it gets read into the buffer pool, compression is only for data on disk (or in an image copy of course). The

Sent from my iPhone

On Dec 21, 2017, at 12:43 AM, Michael Hannan <[login to unmask email]<mailto:[login to unmask email]>><mailto:[login to unmask email]%3e%3e> wrote:


James,

Well yes, I always read the Performance Topics Redbooks, and more than one time, for any interesting topics. You will note section 4.14 has all these fancy diagrams showing the row format, which all seems completely useless for a compressed row which is not covered in detail diagrams at all in terms of what it looks like. Possibly it is too complex for mere mortals to understand.

So how does Db2 find the starting position of any column in a compressed row? Do any compressed columns or strings of columns have a fixed offset inside the row? Does Db2 have to decompress the full row before it can look up any columns in the row? Is the hardware assist and compression dictionary so efficient that full row decompression can happen in a very short time? Does RRF mean that only part of the row need be decompressed in order to find a column at a given offset, in the non compressed row?

I hate things that work by mysterious means since it makes it harder to understand the performance aspects. In my early days of Db2 I heard what it costs to do a Getpage, but only later worked out what a Getpage meant, and roughly what things had to be done, that explained the significant cost, most importantly when Db2 needs to do a Getpage, and when it does not need to.

In Reply to James Campbell:

Did you read DB2 9 Performance Topics section 4.14
http://www.redbooks.ibm.com/redbooks/pdfs/sg247473.pdf

As far as I have ever known, either an entire row is compressed or none of it is. There is a
flag in the row header indicating which.

James Campbell


On 18 Dec 2017 at 23:09, Michael Hannan wrote:

>
> I have seen presentations and articles concerning re-ordered Row Format and what part of data
> rows gets logged. I have read about logging from first to last changed byte.
> In the real world most of our large tables are going to be compressed, so descriptions of the row
> format for RRF using non compressed rows is somewhat uninteresting to me. Yet the format of
> the row in terms of columns and their offsets for compressed is not described anywhere that I
> could see. Perhaps it is a big IBM secret. Yes I understood there is a dictionary and Ziv-Lempel.
> But that does little to help understand the actual format of the row. Are the columns re-ordered?
> Is there a bit string to indicate which columns are compressed? etc.
> Perhaps this topic has been well covered before but I have not seen any good description of
> what a compressed row looks like?
> Michael Hannan,
> DB2 Application Performance Specialist
> CPT Global Ltd
>



Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

-----End Original Message-----

================================
Rocket Software, Inc. and subsidiaries ■ 77 Fourth Avenue, Waltham MA 02451 ■ Main Office Toll Free Number: +1 877.328.2932
Contact Customer Support: https://my.rocketsoftware.com/RocketCommunity/RCEmailSupport
Unsubscribe from Marketing Messages/Manage Your Subscription Preferences - http://www.rocketsoftware.com/manage-your-email-preferences
Privacy Policy - http://www.rocketsoftware.com/company/legal/privacy-policy
================================

This communication and any attachments may contain confidential information of Rocket Software, Inc. All unauthorized use, disclosure or distribution is prohibited. If you are not the intended recipient, please notify Rocket Software immediately and destroy all copies of this communication. Thank you.

-----End Original Message-----
**This e-mail, including any attachments, may be confidential, privileged, or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy, or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system.**