Placement of the Varchar Columns

Jorg Lueke

Placement of the Varchar Columns
List Members,

I am getting confused the more I look into the recommended placement of
Varchar colunms fro DB2 for z/OS V7. In the admin guide, the recommendation
for read access of all types is to put varchars at the end. This is
undisputed. When we talk about tables that are updated frequently the
recommendations change based on what I read.

One company put out Top 20 performance myths and in it one of the myths is
that Varchars should always be placed at the end of the row. They argue
that frequently updated columns should be placed after Varchar columns
because varyling length rows are logged from the first updated byte to the
end of the row. Thus why log a whole varchar? However the admin guide
tells a slightly different story:

"For varying-length rows, data is logged from the first changed byte to the
end of the last updated column. (A varying-length row contains one or more
varying-length columns.)"

Based on this statement as long as the Varchar is not between updated
columns it won't get logged.

Question #1: For varying length rows is the update logged from the first
changed byte to the end of teh row or from teh first changed byte to the
last upadted column.

Interestingly, the admin guide then goes on to say:

"Keep all frequently updated
columns near the end of the row.
However, if only fixed-length
columns will be updated, keep
those columns close to each other
at the beginning of the row."

Question #2: What is the point of this statement? If I am updating only
fixed length columns I can keep the varchar at the end. So, why not simply
say Varchars should always be at the end of the row? Based on the statemtn
above it's just saying keep updated varchars after non-updated varchar, or
am I missing something here...

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Rob --- Sr. Database Administrator --- CFS Crane

Re: Placement of the Varchar Columns
(in response to Jorg Lueke)
By placing columns that get updated together as a set, you can limit what has to be logged. V7 introduced the ability to not have to log to the end of the row for varchar tables assuming conditions are meet (TS not using compression, table does not have edit proc, the update of variable column does not change length).

The attached PowerPoint slide might help. The notes below are some items we try to follow. There are many things that prevent the ideal situation from occurring in the real world, and the guidelines below reflect the world I get to work in so may not pertain to your environment.


<<Minimize Logging.ppt>>
Generic guidelines for varchar column placement

1) Instances where column placement does not effect what is logged for an update, since the entire row is logged in these scenarios.
a. Tablespace has compression turned on.
b. Table has an editproc.
c. Update of variable column changes length (increase or decrease).
2) Column placement to minimize the impact to the business should be a goal.
a. Business needs may change assumptions about what we consider to be "static" varchar columns. As the business grows it is not safe to assume what is a static varchar today will be a static varchar tomorrow.
b. Column groupings or "sets" can change with application releases and are difficult to flush out.
c. For existing structures ease of maintenance along with shorter change outages favor adding columns to end of existing tables rather than unloading, dropping, recreating and loading the new structure to achieve the column placement that might help log performance.
3) New Table Creates
a. Update groupings not yet established.
b. Presenting a consistent methodology to the programmers we support is desirable.
c. VARCHAR will be placed at end of the table, unless those columns are part of the primary key, in which case they will be placed in primary key order at the beginning of the table.
d. Tablespace compression should be favored. Going forward we should show why "not" to use compression, instead of why to use compression.
i. Exception for image data.
ii. Exception for data that does not compress well(DSN1COMP figures).
iii. Compression caused a performance problem.
1. Some update intensive applications could possibly see decreased performance with compression in rare cases.
2. Having more rows on a page typically increases performance.
3. Commit logic needs to be present since each page should have more rows and lack of committing could increase lock contention in batch jobs. Those batch jobs should be sorting their input files in clustering order and committing properly.



-----Original Message-----
From: DB2 Data Base Discussion List [mailto:[login to unmask email] On Behalf Of Jorg Lueke
Sent: Thursday, December 23, 2004 9:50 AM
To: [login to unmask email]
Subject: Placement of the Varchar Columns

List Members,

I am getting confused the more I look into the recommended placement of
Varchar colunms fro DB2 for z/OS V7. In the admin guide, the recommendation
for read access of all types is to put varchars at the end. This is
undisputed. When we talk about tables that are updated frequently the
recommendations change based on what I read.

One company put out Top 20 performance myths and in it one of the myths is
that Varchars should always be placed at the end of the row. They argue
that frequently updated columns should be placed after Varchar columns
because varyling length rows are logged from the first updated byte to the
end of the row. Thus why log a whole varchar? However the admin guide
tells a slightly different story:

"For varying-length rows, data is logged from the first changed byte to the
end of the last updated column. (A varying-length row contains one or more
varying-length columns.)"

Based on this statement as long as the Varchar is not between updated
columns it won't get logged.

Question #1: For varying length rows is the update logged from the first
changed byte to the end of teh row or from teh first changed byte to the
last upadted column.

Interestingly, the admin guide then goes on to say:

"Keep all frequently updated
columns near the end of the row.
However, if only fixed-length
columns will be updated, keep
those columns close to each other
at the beginning of the row."

Question #2: What is the point of this statement? If I am updating only
fixed length columns I can keep the varchar at the end. So, why not simply
say Varchars should always be at the end of the row? Based on the statemtn
above it's just saying keep updated varchars after non-updated varchar, or
am I missing something here...

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm




*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************


---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Jorg Lueke

Re: Placement of the Varchar Columns
(in response to Rob --- Sr. Database Administrator --- CFS Crane)
Rob,

Thanks for teh powerpoint. I see that part of the differences comes from
DB2 V6 to DB2 V7. You do note an exception when logging is used. In your
power point it states that the entire row from the update back is logged
when there is a change in row length, an edit proc, or compression. On the
latter point, the admin guide merely states:

"v Logging effects
If a data row is compressed, all data that is logged because of SQL changes to
that data is compressed. Thus, you can expect less logging for insertions and
deletions; the amount of logging for updates varies. Applications that are
sensitive to log-related resources can experience some benefit with compressed
data."

It makes no indication in this spot that the whole row would be logged. Is
this documented elsewhere?

Thanks

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Rob --- Sr. Database Administrator --- CFS Crane

Re: Placement of the Varchar Columns
(in response to Jorg Lueke)
I will look for the reference, but believe it is associated to the fact that compression is at a row level not column level, so the entire row is logged.

-----Original Message-----
From: Jorg Lueke [mailto:[login to unmask email]
Sent: Thursday, December 23, 2004 11:38 AM
To: [login to unmask email]; Crane, Rob --- Sr. Database Administrator --- CFS
Subject: Re: Placement of the Varchar Columns

Rob,

Thanks for teh powerpoint. I see that part of the differences comes from
DB2 V6 to DB2 V7. You do note an exception when logging is used. In your
power point it states that the entire row from the update back is logged
when there is a change in row length, an edit proc, or compression. On the
latter point, the admin guide merely states:

"v Logging effects
If a data row is compressed, all data that is logged because of SQL changes to
that data is compressed. Thus, you can expect less logging for insertions and
deletions; the amount of logging for updates varies. Applications that are
sensitive to log-related resources can experience some benefit with compressed
data."

It makes no indication in this spot that the whole row would be logged. Is
this documented elsewhere?

Thanks




*******************************************************
This message contains information that is confidential
and proprietary to FedEx Freight or its affiliates.
It is intended only for the recipient named and for
the express purpose(s) described therein.
Any other use is prohibited.
*******************************************************

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Michael Ebert

Re: Placement of the Varchar Columns
(in response to Rob --- Sr. Database Administrator --- CFS Crane)
I *suspect* that DB2 handles logging in this way (it's how I would do it,
and I don't really have any doubt):

DB2 compares the "before" and "after" images of the row. It starts
comparing from the start of both rows until it finds the first difference.
Then it continues scanning and remembers the last difference. The bytes
between the first and the last difference are logged ("before" and "after"
parts as well as start position). Now consider:

If the row changes in length, then there will definitely be a mismatch once
the shorter row ends. Thus the last changed byte would be the end of the
row. (This could be avoided by scanning from the front of the row for the
first mismatch, and from the end backwards for the last mismatch, but
there's a machine instruction for the forward compare if I remember right,
while a backward compare would not mesh well with how memory is accessed -
it would probably carry a heavy performance penalty). Also remember that a
VARCHAR starts with a length byte or halfword, so if you change the length,
the first difference will be right at the beginning, even if you only add a
char at the end.
Compression creates variable-length rows, and any update is likely to
change the length of the row, resulting in the previous case. Also, because
compression replaces several bytes with fewer bytes, your difference would
most likely start earlier in the row, possibly even at the start. This
might get telescoped into "the whole row gets logged", which I suspect is
not true. Editproc-encoded rows would be a similar case.

Thus you you would get the rule to place updated columns after non-updated
columns, generally ordering them from "least likely to be updated" to "most
likely to be updated" at the very end. VARCHARs complicate the matter,
because they carry a higher logging cost in case of length change,
depending on how far it is from the end of the row. Thus it might be more
efficient to place a less often updated VARCHAR after a more often updated
CHAR. If you want to tune your update logging down to that level, you need
figures on how likely each column is to be updated (for VARCHARs, whether
or not the length changes); also you need to know which columns are likely
to be updated together - a sort of update cross-correlation matrix. Then
try all permutations of column ordering to see which one would result in
the least logging (sounds like an NP-problem to me)....

If you have to distill a few docu guidelines from that simple algorithm
with its complicated consequences, then you're likely to end up with the
messy rules you're currently trying to figure out.

Dr. Michael Ebert
DB2 Database Administrator
aMaDEUS Data Processing
Erding / Munich, Germany




From: Jorg Lueke <[login to unmask email]>@IDUGDB2-L.ORG on 23-12-2004
12:37 CST

Please respond to DB2 Database Discussion list at IDUG
<[login to unmask email]>

Sent by: DB2 Data Base Discussion List <[login to unmask email]>



To: [login to unmask email]


cc:






Subjec Re: Placement of the Varchar
t: Columns








Rob,

Thanks for teh powerpoint. I see that part of the differences comes from
DB2 V6 to DB2 V7. You do note an exception when logging is used. In your
power point it states that the entire row from the update back is logged
when there is a change in row length, an edit proc, or compression. On the
latter point, the admin guide merely states:

"v Logging effects
If a data row is compressed, all data that is logged because of SQL changes
to
that data is compressed. Thus, you can expect less logging for insertions
and
deletions; the amount of logging for updates varies. Applications that are
sensitive to log-related resources can experience some benefit with
compressed
data."

It makes no indication in this spot that the whole row would be logged. Is
this documented elsewhere?

Thanks

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Jorg Lueke

Re: Placement of the Varchar Columns
(in response to Michael Ebert)
Yes, that is the problem with rules, guidelines, processeses and procedures.
IBM has distilled this in the following way:

"For updates, the amount of data that is logged for update depends
on whether the row contains all fixed-length columns or not.
For fixed-length rows, changes are logged only from the beginning
of the first udpated column to the end of the last updated column.
For varying-length rows, data is logged from the first changed
byte to the end of the last updated column."


If these two statements are true what is the real difference in logging? I
see the possibility that a Varchar update would need to log less data since
it starts at the first changed byte rather than the first chnaged column.

The one thing it seems one should not do is surround a Varchar field with
two columns that are often updated as then the whole span must be logged.
But otherwise, even if the row length changes, only the changed bytes are
logged in either case. The statement that changes are logged to the end of
the row seems only to have been applicable in V6 and/or prior versions.

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Jorg Lueke

Re: Placement of the Varchar Columns
(in response to Jorg Lueke)
I just received the following clarifying statement

Basically, if all the columns are fixed length, then DB2 will log the changes
from the first column changed to the last. If an updated
column has it's length change, then, in effect, it's causing a "push" to
all the subsequent columns which will cause it to log all the data from
the first changed byte to the end of the row. Does that make it a
little clearer?

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

[login to unmask email]

Re: Placement of the Varchar Columns
(in response to Jorg Lueke)
Jorg,

Beware the difference between a "changed column" and a "changed byte",
especially in the case of COMPRESS NO tablespaces. In this case, a "row"
consists of a 6-byte prefix concatenated with the column data. If you
change the length of a VARCHAR column value, this changes the total length
of the row . . . which is stored in the row prefix. In this case, the
"first changed byte" of the row is in the Prefix. Logging is then from
this first changed byte to the end of the changed column.

One can then conclude that, if the length of the row is going to change on
SQL Update, to reduce logging one wants the updated columns at the
beginning of the row.

Lock Lyon
Compuware Corp




Jorg Lueke <[login to unmask email]>
Sent by: DB2 Data Base Discussion List <[login to unmask email]>
12/27/2004 10:01 AM
Please respond to
DB2 Database Discussion list at IDUG <[login to unmask email]>


To
[login to unmask email]
cc

Subject
Re: [DB2-L] Placement of the Varchar Columns






Yes, that is the problem with rules, guidelines, processeses and
procedures.
IBM has distilled this in the following way:

"For updates, the amount of data that is logged for update depends
on whether the row contains all fixed-length columns or not.
For fixed-length rows, changes are logged only from the beginning
of the first udpated column to the end of the last updated column.
For varying-length rows, data is logged from the first changed
byte to the end of the last updated column."


If these two statements are true what is the real difference in logging? I
see the possibility that a Varchar update would need to log less data
since
it starts at the first changed byte rather than the first chnaged column.

The one thing it seems one should not do is surround a Varchar field with
two columns that are often updated as then the whole span must be logged.
But otherwise, even if the row length changes, only the changed bytes are
logged in either case. The statement that changes are logged to the end
of
the row seems only to have been applicable in V6 and/or prior versions.




---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Jorg Lueke

Re: Placement of the Varchar Columns
(in response to LL581@DAIMLERCHRYSLER.COM)
On Mon, 27 Dec 2004 18:18:06 -0500, [login to unmask email] wrote:

>Jorg,
>
>Beware the difference between a "changed column" and a "changed byte",
>especially in the case of COMPRESS NO tablespaces. In this case, a "row"
>consists of a 6-byte prefix concatenated with the column data. If you
>change the length of a VARCHAR column value, this changes the total length
>of the row . . . which is stored in the row prefix. In this case, the
>"first changed byte" of the row is in the Prefix. Logging is then from
>this first changed byte to the end of the changed column.
>
>One can then conclude that, if the length of the row is going to change on
>SQL Update, to reduce logging one wants the updated columns at the
>beginning of the row.
>
>Lock Lyon
>Compuware Corp
>
Lock,

If that statement is true along with the previous message about a changed
row length causing a push and logging to the end of the row then it would
seem that teh whole row would get logged sicne teh first byte is in the
prefix and the changed length of the row causes the whole row to be
logged.

More food for thought...

Jorg
>
>
>Jorg Lueke <[login to unmask email]>
>Sent by: DB2 Data Base Discussion List <[login to unmask email]>
>12/27/2004 10:01 AM
>Please respond to
>DB2 Database Discussion list at IDUG <[login to unmask email]>
>
>
>To
>[login to unmask email]
>cc
>
>Subject
>Re: [DB2-L] Placement of the Varchar Columns
>
>
>
>
>
>
>Yes, that is the problem with rules, guidelines, processeses and
>procedures.
> IBM has distilled this in the following way:
>
>"For updates, the amount of data that is logged for update depends
>on whether the row contains all fixed-length columns or not.
>For fixed-length rows, changes are logged only from the beginning
>of the first udpated column to the end of the last updated column.
>For varying-length rows, data is logged from the first changed
>byte to the end of the last updated column."
>
>
>If these two statements are true what is the real difference in logging? I
>see the possibility that a Varchar update would need to log less data
>since
>it starts at the first changed byte rather than the first chnaged column.
>
>The one thing it seems one should not do is surround a Varchar field with
>two columns that are often updated as then the whole span must be logged.
>But otherwise, even if the row length changes, only the changed bytes are
>logged in either case. The statement that changes are logged to the end
>of
>the row seems only to have been applicable in V6 and/or prior versions.
>
>
>
>
>---------------------------------------------------------------------------
------
>Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and
home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page
select "Join or Leave the list". The IDUG DB2-L FAQ is at
http://www.idugdb2-l.org. The IDUG List Admins can be reached at DB2-L-
[login to unmask email] Find out the latest on IDUG conferences at
http://conferences.idug.org/index.cfm
>

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm

Roger Miller

Re: Placement of the Varchar Columns
(in response to Jorg Lueke)
The prefix was handled quite a while ago, so that is not a problem. As
you noted, what you have is summary statements. Taking a summary
statement and then presuming there are no exceptions is likely to cause
some erroneous conclusions.

Roger Miller

On Mon, 27 Dec 2004 18:18:06 -0500, [login to unmask email] wrote:

>Jorg,
>
>Beware the difference between a "changed column" and a "changed byte",
>especially in the case of COMPRESS NO tablespaces. In this case, a "row"
>consists of a 6-byte prefix concatenated with the column data. If you
>change the length of a VARCHAR column value, this changes the total length
>of the row . . . which is stored in the row prefix. In this case, the
>"first changed byte" of the row is in the Prefix. Logging is then from
>this first changed byte to the end of the changed column.
>
>One can then conclude that, if the length of the row is going to change on
>SQL Update, to reduce logging one wants the updated columns at the
>beginning of the row.
>
>Lock Lyon
>Compuware Corp
>
>
>Jorg Lueke <[login to unmask email]>
>Sent by: DB2 Data Base Discussion List <[login to unmask email]>
>12/27/2004 10:01 AM
>Please respond to
>DB2 Database Discussion list at IDUG <[login to unmask email]>
>
>
>To
>[login to unmask email]
>cc
>
>Subject
>Re: [DB2-L] Placement of the Varchar Columns
>
>
>Yes, that is the problem with rules, guidelines, processeses and
>procedures.
> IBM has distilled this in the following way:
>
>"For updates, the amount of data that is logged for update depends
>on whether the row contains all fixed-length columns or not.
>For fixed-length rows, changes are logged only from the beginning
>of the first udpated column to the end of the last updated column.
>For varying-length rows, data is logged from the first changed
>byte to the end of the last updated column."
>
>
>If these two statements are true what is the real difference in logging? I
>see the possibility that a Varchar update would need to log less data
>since
>it starts at the first changed byte rather than the first chnaged column.
>
>The one thing it seems one should not do is surround a Varchar field with
>two columns that are often updated as then the whole span must be logged.
>But otherwise, even if the row length changes, only the changed bytes are
>logged in either case. The statement that changes are logged to the end
>of
>the row seems only to have been applicable in V6 and/or prior versions.
>

---------------------------------------------------------------------------------
Welcome to the IDUG DB2-L list. To unsubscribe, go to the archives and home page at http://www.idugdb2-l.org/archives/db2-l.html. From that page select "Join or Leave the list". The IDUG DB2-L FAQ is at http://www.idugdb2-l.org. The IDUG List Admins can be reached at [login to unmask email] Find out the latest on IDUG conferences at http://conferences.idug.org/index.cfm