Large table and index

Grace Chen

Large table and index
Hi List,
We just started to create the Data Warehouse tablespace for our developers.
I use DB2 Estimator. I found that we have pretty big table which need 1.6 G
bytes and it has 2.1 G bytes index. This is because this table has variable
length column and for table, DB2 Estimator use average column length is 50%
of the maximum and the index takes maximum length.
Do you think it's reasonable to have so big index? I am wondering how the
performance will be? Do I need partition it?
Any suggestion will be highly appreciated.

Grace Chen
Technical Support
Credit Union Central Alberta Ltd.
Tel: (403) 258-5982
E-mail : [login to unmask email]


_______________________________________________________
eSafe Protect Gateway has scanned this mail for viruses, vandals and
suspicious attachments.

Robert Jans

Re: Large table and index
(in response to Grace Chen)
Grace, I haven't seen any replies yet, so I'll take a swipe at it. The
really obvious answer is "It depends". This _could_ be a very good candidate
for partitioning. You need to look at what the data is like and how the
users access it. If there are definite groupings of data (like chequeing
accounts, savings accounts, credit union ) it may be possible to use that or
a combination of them for a partitioning key provided the main access to the
data would be along a similar pattern. It wouldn't help much if you
partitioned the table but the users didn't/wouldn't/couldn't access the data
via the partitioning keys _most_ of the time.

Robert Jans
Albertson's Inc
<Standard Disclaimers Apply>

-----Original Message-----
From: Grace Chen [mailto:[login to unmask email]
Sent: Friday, December 14, 2001 11:15 AM
To: [login to unmask email]
Subject: Large table and index


Hi List,
We just started to create the Data Warehouse tablespace for our developers.
I use DB2 Estimator. I found that we have pretty big table which need 1.6 G
bytes and it has 2.1 G bytes index. This is because this table has variable
length column and for table, DB2 Estimator use average column length is 50%
of the maximum and the index takes maximum length.
Do you think it's reasonable to have so big index? I am wondering how the
performance will be? Do I need partition it?
Any suggestion will be highly appreciated.

Grace Chen
Technical Support
Credit Union Central Alberta Ltd.
Tel: (403) 258-5982
E-mail : [login to unmask email]


_______________________________________________________

eSafe Protect Gateway has scanned this mail for viruses, vandals and

suspicious attachments.

sushant dash

Re: Large table and index
(in response to Robert Jans)
Hello Grace/Robert, Just thought to add some thing.It is sure that always the user may not get benefit of partitioning the table spaces but a well design and consideration approach could get benefit out of partitioning. There is always trade off.
There are possible ways that we could get PAGE_RANGE scan with a well considered design of partioning and index design. Here are some facts that temts us to make the table partitins and consider for variable lengh fields. It would be better to have it reviewd by experts as well.
When ever one goes for estimating the size one should how much pages one's tablespace would be. Once the tablespace number of pages is more than 100000 then one should go for partitioning the table space.
For variable column consideration one should see how much space he can save. It has been a regular practice with the design that when ever you see there is need of storing the data in more than 32 chacters then you could go for making varchar. I mean if any column you know can have variable data storage can not exceed 32 chacters then make it char to avoid the indirect references as well page split. Also any column involved in index , try to avoid it making it varibale length. There are enhancements in version 6 onward for getting benefit out of making varibale column as index. Also see to it that how many indeexes you need and if you know the pre defined range for your partioning then it would be better to make that as both primary and partitioning/clustering key to get the most beenfit out of it. It would be better if you could see the number of rows you would have in tablespace. If it crosses more than 100 million then it would be better to have only two indexes for that.
Consider all these facts and confirm with second opnion and proceed.
Hope it may help you out.
Dash


On Sat, 15 Dec 2001 Robert Jans wrote :
> Grace, I haven't seen any replies yet, so I'll take a
> swipe at it. The
> really obvious answer is "It depends". This _could_ be
> a very good candidate
> for partitioning. You need to look at what the data is
> like and how the
> users access it. If there are definite groupings of
> data (like chequeing
> accounts, savings accounts, credit union ) it may be
> possible to use that or
> a combination of them for a partitioning key provided
> the main access to the
> data would be along a similar pattern. It wouldn't help
> much if you
> partitioned the table but the users
> didn't/wouldn't/couldn't access the data
> via the partitioning keys _most_ of the time.
>
> Robert Jans
> Albertson's Inc
> <Standard Disclaimers Apply>
>
> -----Original Message-----
> From: Grace Chen [mailto:[login to unmask email]
> Sent: Friday, December 14, 2001 11:15 AM
> To: [login to unmask email]
> Subject: Large table and index
>
>
> Hi List,
> We just started to create the Data Warehouse tablespace
> for our developers.
> I use DB2 Estimator. I found that we have pretty big
> table which need 1.6 G
> bytes and it has 2.1 G bytes index. This is because
> this table has variable
> length column and for table, DB2 Estimator use average
> column length is 50%
> of the maximum and the index takes maximum length.
> Do you think it's reasonable to have so big index? I
> am wondering how the
> performance will be? Do I need partition it?
> Any suggestion will be highly appreciated.
>
> Grace Chen
> Technical Support
> Credit Union Central Alberta Ltd.
> Tel: (403) 258-5982
> E-mail : [login to unmask email]
>
>
> _______________________________________________________
>
> eSafe Protect Gateway has scanned this mail for viruses,
> vandals and
>
> suspicious attachments.
>





Sanjeev (CTS) S

Re: Large table and index
(in response to sushant dash)
Grace,

Why not use ZPARM RETVLCHK = YES(check the exact spelling of the ZPARM
option) for variable length index. However the question is which version of
DB2 you are in ? This option is available from V6 and above. This way you
can have size of index reduced from what it is now and may be less than the
table size if all the columns are not used in index.

Dash,

I do agree with partitioning when size is more but i do not think (or am not
aware) there is a guidelines of 100000 pages for partitioning. Partitioning
depends on size,access and maintenance, all the three at its best trade-off.

HTH

With Best Regards,
Sanjeev

> -----Original Message-----
> From: sushant dash [SMTP:[login to unmask email]
> Sent: Saturday, December 15, 2001 6:35 AM
> To: [login to unmask email]
> Subject: Re: Large table and index
>
> Hello Grace/Robert, Just thought to add some thing.It is sure that always
> the user may not get benefit of partitioning the table spaces but a well
> design and consideration approach could get benefit out of partitioning.
> There is always trade off.
> There are possible ways that we could get PAGE_RANGE scan with a well
> considered design of partioning and index design. Here are some facts that
> temts us to make the table partitins and consider for variable lengh
> fields. It would be better to have it reviewd by experts as well.
> When ever one goes for estimating the size one should how much pages one's
> tablespace would be. Once the tablespace number of pages is more than
> 100000 then one should go for partitioning the table space.
> For variable column consideration one should see how much space he can
> save. It has been a regular practice with the design that when ever you
> see there is need of storing the data in more than 32 chacters then you
> could go for making varchar. I mean if any column you know can have
> variable data storage can not exceed 32 chacters then make it char to
> avoid the indirect references as well page split. Also any column involved
> in index , try to avoid it making it varibale length. There are
> enhancements in version 6 onward for getting benefit out of making
> varibale column as index. Also see to it that how many indeexes you need
> and if you know the pre defined range for your partioning then it would be
> better to make that as both primary and partitioning/clustering key to get
> the most beenfit out of it. It would be better if you could see the number
> of rows you would have in tablespace. If it crosses more than 100 million
> then it would be better to have only two indexes for that.
> Consider all these facts and confirm with second opnion and proceed.
> Hope it may help you out.
> Dash
>
>
> On Sat, 15 Dec 2001 Robert Jans wrote :
> > Grace, I haven't seen any replies yet, so I'll take a
> > swipe at it. The
> > really obvious answer is "It depends". This _could_ be
> > a very good candidate
> > for partitioning. You need to look at what the data is
> > like and how the
> > users access it. If there are definite groupings of
> > data (like chequeing
> > accounts, savings accounts, credit union ) it may be
> > possible to use that or
> > a combination of them for a partitioning key provided
> > the main access to the
> > data would be along a similar pattern. It wouldn't help
> > much if you
> > partitioned the table but the users
> > didn't/wouldn't/couldn't access the data
> > via the partitioning keys _most_ of the time.
> >
> > Robert Jans
> > Albertson's Inc
> > <Standard Disclaimers Apply>
> >
> > -----Original Message-----
> > From: Grace Chen [mailto:[login to unmask email]
> > Sent: Friday, December 14, 2001 11:15 AM
> > To: [login to unmask email]
> > Subject: Large table and index
> >
> >
> > Hi List,
> > We just started to create the Data Warehouse tablespace
> > for our developers.
> > I use DB2 Estimator. I found that we have pretty big
> > table which need 1.6 G
> > bytes and it has 2.1 G bytes index. This is because
> > this table has variable
> > length column and for table, DB2 Estimator use average
> > column length is 50%
> > of the maximum and the index takes maximum length.
> > Do you think it's reasonable to have so big index? I
> > am wondering how the
> > performance will be? Do I need partition it?
> > Any suggestion will be highly appreciated.
> >
> > Grace Chen
> > Technical Support
> > Credit Union Central Alberta Ltd.
> > Tel: (403) 258-5982
> > E-mail : [login to unmask email]
> >
> >
> > _______________________________________________________
> >
> > eSafe Protect Gateway has scanned this mail for viruses,
> > vandals and
> >
> > suspicious attachments.
> >
>
>
>
>
>
> http://www.ryci.com/db2-l. The owners of the list can be reached at
> [login to unmask email]

sushant dash

Re: Large table and index
(in response to Sanjeev (CTS) S)
Thanks Sanjeev , thanks for reviewing and suggesting for indexing when there is varchar.I was waiting for the comments from the experts and also got some comments for improvements. Hope it would help Robert. Thought to clarify the doubts you have. Though there is no written guidelines for making the tablespace partitioned , but I read some presentation of Lawson and Richard, in which as per there experiences they consider this. Also when I discuss with the experts they aslo agree fro making the tablespace partitioned when the size goes beyond 100000 pages. Also there is a consideration of making the table a segmented one when the size aslo goes beyond 10000 pages. With this guidelines I m also implementing my design here which is being reviewd by experts from IBM. I did not get any comments regarding this startegy. They gave their noddings for that. I would appreciate if you have any other suggestions , so that i could implement as quick as possible. These are the only consideration for making the tablespace partiotions, but for determining the number of partitions we have to consider the number of CPs availbale and check most of our queries whether I/O bound or CPU bound. Based on the statistics on these queries and to gain parallelism we could decide the number of partitions.
There are also another method of handling the large tables with version 7. With the help of union availabale within the view we could divide the table with small tables with same attributes. Then we can make a view on those tables taking the help of union. In search criteria we have to specify how we could access the required table by providing the required predicate. This way there would be no limit for storing as much as data. The insertions and updates can be on tables(same attributed) while selection would be on views. This enables the data to be controlled.There are merits for this approach in the sense that we could store more than 16 tb of data which is not possible by single table. Also the tables with the same attribute have diffrent indexes on each. The disadvantage with this there would be more view materialization. But a well design could help us.
Please suggest if you have any other comments. I would also provide more informations for robert once I m free.

Thanks
Dash

On Mon, 17 Dec 2001 S, Sanjeev (Cognizant) wrote :
> Grace,
>
> Why not use ZPARM RETVLCHK = YES(check the exact
> spelling of the ZPARM
> option) for variable length index. However the question
> is which version of
> DB2 you are in ? This option is available from V6 and
> above. This way you
> can have size of index reduced from what it is now and
> may be less than the
> table size if all the columns are not used in index.
>
> Dash,
>
> I do agree with partitioning when size is more but i do
> not think (or am not
> aware) there is a guidelines of 100000 pages for
> partitioning. Partitioning
> depends on size,access and maintenance, all the three
> at its best trade-off.
>
> HTH
>
> With Best Regards,
> Sanjeev
>
> > -----Original Message-----
> > From: sushant dash [SMTP:[login to unmask email]
> > Sent: Saturday, December 15, 2001 6:35 AM
> > To: [login to unmask email]
> > Subject: Re: Large table and index
> >
> > Hello Grace/Robert, Just thought to add some thing.It
> is sure that always
> > the user may not get benefit of partitioning the
> table spaces but a well
> > design and consideration approach could get benefit
> out of partitioning.
> > There is always trade off.
> > There are possible ways that we could get PAGE_RANGE
> scan with a well
> > considered design of partioning and index design.
> Here are some facts that
> > temts us to make the table partitins and consider for
> variable lengh
> > fields. It would be better to have it reviewd by
> experts as well.
> > When ever one goes for estimating the size one should
> how much pages one's
> > tablespace would be. Once the tablespace number of
> pages is more than
> > 100000 then one should go for partitioning the table
> space.
> > For variable column consideration one should see how
> much space he can
> > save. It has been a regular practice with the design
> that when ever you
> > see there is need of storing the data in more than 32
> chacters then you
> > could go for making varchar. I mean if any column you
> know can have
> > variable data storage can not exceed 32 chacters then
> make it char to
> > avoid the indirect references as well page split.
> Also any column involved
> > in index , try to avoid it making it varibale length.
> There are
> > enhancements in version 6 onward for getting benefit
> out of making
> > varibale column as index. Also see to it that how
> many indeexes you need
> > and if you know the pre defined range for your
> partioning then it would be
> > better to make that as both primary and
> partitioning/clustering key to get
> > the most beenfit out of it. It would be better if you
> could see the number
> > of rows you would have in tablespace. If it crosses
> more than 100 million
> > then it would be better to have only two indexes for
> that.
> > Consider all these facts and confirm with second
> opnion and proceed.
> > Hope it may help you out.
> > Dash
> >
> >
> > On Sat, 15 Dec 2001 Robert Jans wrote :
> > > Grace, I haven't seen any replies yet, so I'll take
> a
> > > swipe at it. The
> > > really obvious answer is "It depends". This _could_
> be
> > > a very good candidate
> > > for partitioning. You need to look at what the data
> is
> > > like and how the
> > > users access it. If there are definite groupings of
> > > data (like chequeing
> > > accounts, savings accounts, credit union ) it may be
> > > possible to use that or
> > > a combination of them for a partitioning key
> provided
> > > the main access to the
> > > data would be along a similar pattern. It wouldn't
> help
> > > much if you
> > > partitioned the table but the users
> > > didn't/wouldn't/couldn't access the data
> > > via the partitioning keys _most_ of the time.
> > >
> > > Robert Jans
> > > Albertson's Inc
> > > <Standard Disclaimers Apply>
> > >
> > > -----Original Message-----
> > > From: Grace Chen [mailto:[login to unmask email]
> > > Sent: Friday, December 14, 2001 11:15 AM
> > > To: [login to unmask email]
> > > Subject: Large table and index
> > >
> > >
> > > Hi List,
> > > We just started to create the Data Warehouse
> tablespace
> > > for our developers.
> > > I use DB2 Estimator. I found that we have pretty
> big
> > > table which need 1.6 G
> > > bytes and it has 2.1 G bytes index. This is because
> > > this table has variable
> > > length column and for table, DB2 Estimator use
> average
> > > column length is 50%
> > > of the maximum and the index takes maximum length.
> > > Do you think it's reasonable to have so big index?
> I
> > > am wondering how the
> > > performance will be? Do I need partition it?
> > > Any suggestion will be highly appreciated.
> > >
> > > Grace Chen
> > > Technical Support
> > > Credit Union Central Alberta Ltd.
> > > Tel: (403) 258-5982
> > > E-mail : [login to unmask email]
> > >
> > >
> > > ____________________________________________________-
> ___
> > >
> > > eSafe Protect Gateway has scanned this mail for
> viruses,
> > > vandals and
> > >
> > > suspicious attachments.
> > >
> >
> >
> >
> > =======================To
> change your subscription
> > options or to cancel your subscription visit the
> DB2-L webpage at
> > http://www.ryci.com/db2-l. The owners of the list can
> be reached at
> > [login to unmask email]