CLOB simulation (V5): page size

Ulf-Otto Cihak

CLOB simulation (V5): page size
Dear all,
we have to store documents (approx. 60K on average, varying considerably)
in DB2/OS390
V5.
The application sees one document as one big chunk, so it has no DB2-known
structure.
Since trailing blanks may be important, we will store the documents in
VARCHAR
(not fix length CHAR + compression). Nevertheless, we will use DB2
compression for this
tablespace.
To store/retrieve one document to/from DB2, we need a segmentation layer
between the
application and DB2. I see the following alternatives:
- segment into 4K blocks, and use a 4KB tablespace
- segment into 32K blocks, and use a 32KB tablespace
The V5 administration guide says:
'DB2 provides a 32KB page size to allow for long records. You can improve
performance by
using 32KB pages for record lengths that are longer than 2028 bytes, as in
some text-
processing or image-processing applications.' I have not found further
information about
this performance improvement in the documentation, redbooks, or DB2L
archives. My
knowledge up to now has been that 32K pages require always 8 internal I/Os,
even if not
fully used up. Therefore, I do not understand where the improvement would
come from.
A 35K text object would then require 9 I/Os with 4K, but 16 I/Os with 32K
pagesize.

Your thoughts on this would be greatly appreciated.
Regards,
Ulf Cihak
Nuernberg, Germany



Chris Blaicher

Re: CLOB simulation (V5): page size
(in response to Ulf-Otto Cihak)
If I was doing it I would use 32K pages and using the intermediate layer,
break the CLOB into 4K or 8K chunks. I would also use DB2 compression.

Here is my reasoning.

1) You have to use the intermediate layer because there is no table type
that supports a page longer than 32K.

2) If you used a 4K page and broke the CLOB into 4K rows, unless you had
compression always greater than 50%, you would not ever fit more than 1 row
per page. Breaking the CLOB into smaller logical rows such as 2K or 1K has
the cost of the repeated key for each row segment.

3) Remembering that DB2 compression is at the row level, it is very easy to
build a 4K chunk row and let DB2 compress it. Using a 32K page size results
in much less wastage as you only have wasted space at the end of 1/8th the
number of pages.

4) Don't worry too much about having a logical image that is on two
different 32K pages. While it is true that a 32K page is written on 8 4K
CIs, media manager can read all 8 CIs with a single I/O.

5) The use of 4K or 8K chunk sizes (or any other size for that matter) is
one for you to decide on after some experimentation. The main thing to
remember is the amount of wasted space at the end of each DB2 page. With a
50% compression ratio and 4K chunks, you can figure about 1K wasted per
page. If you get a higher compression, then you will either waste less per
page, or you can afford to go to a larger chunk size.

These are just my thoughts. I have never had to do this exactly. As
always, you should do your homework and the implementation is left as an
exercise for the student.

I would be interested in what you decide and what kind of results you get.

Chris Blaicher
BMC Software, Inc.
Austin Research Labs
[login to unmask email]

-----Original Message-----
From: Ulf-Otto Cihak [mailto:[login to unmask email]
Sent: Friday, January 07, 2000 1:38 PM
To: [login to unmask email]
Subject: CLOB simulation (V5): page size


Dear all,
we have to store documents (approx. 60K on average, varying considerably)
in DB2/OS390
V5.
The application sees one document as one big chunk, so it has no DB2-known
structure.
Since trailing blanks may be important, we will store the documents in
VARCHAR
(not fix length CHAR + compression). Nevertheless, we will use DB2
compression for this
tablespace.
To store/retrieve one document to/from DB2, we need a segmentation layer
between the
application and DB2. I see the following alternatives:
- segment into 4K blocks, and use a 4KB tablespace
- segment into 32K blocks, and use a 32KB tablespace
The V5 administration guide says:
'DB2 provides a 32KB page size to allow for long records. You can improve
performance by
using 32KB pages for record lengths that are longer than 2028 bytes, as in
some text-
processing or image-processing applications.' I have not found further
information about
this performance improvement in the documentation, redbooks, or DB2L
archives. My
knowledge up to now has been that 32K pages require always 8 internal I/Os,
even if not
fully used up. Therefore, I do not understand where the improvement would
come from.
A 35K text object would then require 9 I/Os with 4K, but 16 I/Os with 32K
pagesize.

Your thoughts on this would be greatly appreciated.
Regards,
Ulf Cihak
Nuernberg, Germany



http://www.ryci.com/db2-l. The owners of the list can be reached at
[login to unmask email]