Question on Secondary indexes on partitioned tables and contention

Kal Sub

Question on Secondary indexes on partitioned tables and contention

Hi,

It's been a while since I dealt with NPSI's and DPSI's, and I would like to validate my understanding. 

We are on z/OS Db2 v11.1 CM.

I have a table into which application is going to write millions of rows each day, and I need to retain data for a week, so I am partitioning with 8 partitions, with a daily rotate partition job taking care of getting rid of oldest day's data and setting up partition for next day.

If I have secondary indexes on the table, how much of an impact would the rotate partition job cause contention for the online application ? I thought creating them as DPSI's would eliminate contention, but not sure by how much. Or does it not matter anymore whether I create them as DPSI or NPSI ? The table will not be queried much, it is pretty much a logging table that can be queried upon on adhoc basis.

Altenatively, I was thinking of creating a simple non-partitioned daily table which would be written to by the application, and which would then be unloaded/loaded into history (partitioned) table using a REORG_DISCARD process. This way rotate partition on history table will not impact online application. This would also help in managing the history table easier without having to worry about impacts to application.

Looking forward to your suggestions.

Thanks

Regards

Kals

Michael Hannan

RE: Question on Secondary indexes on partitioned tables and contention
(in response to Kal Sub)

Kal,

If you are in a Data Sharing situation Deletion of all rows from one partition will cause a lot of page consistency contention on a secondary index. A DPSI can largely solve this by confining the deletes to a partition not being accessed by the online processes.

So a DPSI will definitely be a lot better. Even in non-Data Sharing, the secondary index will have page latch waits and get very disorganised probably indicating Reorg perhaps alleviated with freespace.

I am a fan of DPSIs. People are a little scared of them at times. They have their uses.

In Reply to Kal Sub:

Hi,

It's been a while since I dealt with NPSI's and DPSI's, and I would like to validate my understanding. 

We are on z/OS Db2 v11.1 CM.

I have a table into which application is going to write millions of rows each day, and I need to retain data for a week, so I am partitioning with 8 partitions, with a daily rotate partition job taking care of getting rid of oldest day's data and setting up partition for next day.

If I have secondary indexes on the table, how much of an impact would the rotate partition job cause contention for the online application ? I thought creating them as DPSI's would eliminate contention, but not sure by how much. Or does it not matter anymore whether I create them as DPSI or NPSI ? The table will not be queried much, it is pretty much a logging table that can be queried upon on adhoc basis.

Altenatively, I was thinking of creating a simple non-partitioned daily table which would be written to by the application, and which would then be unloaded/loaded into history (partitioned) table using a REORG_DISCARD process. This way rotate partition on history table will not impact online application. This would also help in managing the history table easier without having to worry about impacts to application.

Looking forward to your suggestions.

Thanks

Regards

Kals



Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Kal Sub

RE: Question on Secondary indexes on partitioned tables and contention
(in response to Michael Hannan)

Thanks for replying, Michael. 

We are not on Data Sharing.

The data that will be deleted is the oldest partition which would be 7 days old, and would not be accessed at that time of night.

My concern was, because when I do the rotate, the rows in that oldest partition would be physically deleted, causing the index to be updated,  potentially causing some lock waits on the application. If I understand you right, creating that index as DPSI will cause only the oldest index partition to be accessed during the rotate, thereby eliminating contention for online app. Is that right ? 

I am also planning to do a load replace that oldest partition with dummy file before I do the rotate. 

Regards

Kals


In Reply to Michael Hannan:

Kal,

If you are in a Data Sharing situation Deletion of all rows from one partition will cause a lot of page consistency contention on a secondary index. A DPSI can largely solve this by confining the deletes to a partition not being accessed by the online processes.

So a DPSI will definitely be a lot better. Even in non-Data Sharing, the secondary index will have page latch waits and get very disorganised probably indicating Reorg perhaps alleviated with freespace.

I am a fan of DPSIs. People are a little scared of them at times. They have their uses.

In Reply to Kal Sub:

Hi,

It's been a while since I dealt with NPSI's and DPSI's, and I would like to validate my understanding. 

We are on z/OS Db2 v11.1 CM.

I have a table into which application is going to write millions of rows each day, and I need to retain data for a week, so I am partitioning with 8 partitions, with a daily rotate partition job taking care of getting rid of oldest day's data and setting up partition for next day.

If I have secondary indexes on the table, how much of an impact would the rotate partition job cause contention for the online application ? I thought creating them as DPSI's would eliminate contention, but not sure by how much. Or does it not matter anymore whether I create them as DPSI or NPSI ? The table will not be queried much, it is pretty much a logging table that can be queried upon on adhoc basis.

Altenatively, I was thinking of creating a simple non-partitioned daily table which would be written to by the application, and which would then be unloaded/loaded into history (partitioned) table using a REORG_DISCARD process. This way rotate partition on history table will not impact online application. This would also help in managing the history table easier without having to worry about impacts to application.

Looking forward to your suggestions.

Thanks

Regards

Kals


Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Michael Hannan

RE: Question on Secondary indexes on partitioned tables and contention
(in response to Kal Sub)

Kal,

I have never done a "rotate", not being in a DBA role, so I avoided using that word. I ave used tables with 12 partitions (one for each month) with partition 1 for January (of each year etc.). I referred to the deletion of rows in one partition. That requires data page locks or row locks or a mass delete lock (for whole table).

Technically indexes do not experience lock waits but rather Latch waits. A DPSI would cause the applications to be latching pages not touched by the partition cleanup process.

Load replace a partition can efficiently make a single partition unavailable briefly, to clean it out, with similarities to a Mass Delete situation. Without a DPSI, an NPI would have many pages to be updated and still experience Index Page Latch Waits. If the application is very quiet with very few transactions at the time, then who cares?
 
In Reply to Kal Sub:

Thanks for replying, Michael. 

We are not on Data Sharing.

The data that will be deleted is the oldest partition which would be 7 days old, and would not be accessed at that time of night.

My concern was, because when I do the rotate, the rows in that oldest partition would be physically deleted, causing the index to be updated,  potentially causing some lock waits on the application. If I understand you right, creating that index as DPSI will cause only the oldest index partition to be accessed during the rotate, thereby eliminating contention for online app. Is that right ? 

I am also planning to do a load replace that oldest partition with dummy file before I do the rotate. 

Regards

Kals

 

 

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Dec 14, 2017 - 05:40 AM (Europe/Berlin)

Jay Reavill

Question on Secondary indexes on partitioned tables and contention
(in response to Michael Hannan)
Kal,

What you are proposing is what we do and it works well for us… Dummy load prior to rotate and use DPSI’s to reduce contention and overhead.

Thanks,
Jay

Jay Reavill
Database Administrator
NACard Solutions
• Office: 727.227.2144
• Cell: 727.215.5794
•: [login to unmask email]<mailto:[login to unmask email]>
FIS | Empowering the Financial World [cid:[login to unmask email] https://www.facebook.com/FIStoday [cid:[login to unmask email] https://twitter.com/FISGlobal [cid:[login to unmask email] https://www.linkedin.com/company/fis

From: Michael Hannan [mailto:[login to unmask email]
Sent: Wednesday, December 13, 2017 11:39 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Question on Secondary indexes on partitioned tables and contention


Kal,

I have never done a "rotate" so I avoided using that word. I referred to the deletion of rows in one partition. That requires data page locks or row locks or a mass delete lock (for whole table).

Technically indexes do not experience lock waits but rather Latch waits. A DPSI would cause the applications to be latching pages not touched by the partition cleanup process.

Load replace a partition can efficiently make a single partition unavailable briefly, to clean it out, with similarities to a Mass Delete situation. Without a DPSI, an NPI would have many pages to be updated and still experience Index Page Latch Waits. If the application is very quiet with very few transactions at the time, then who cares?

In Reply to Kal Sub:

Thanks for replying, Michael.

We are not on Data Sharing.

The data that will be deleted is the oldest partition which would be 7 days old, and would not be accessed at that time of night.

My concern was, because when I do the rotate, the rows in that oldest partition would be physically deleted, causing the index to be updated, potentially causing some lock waits on the application. If I understand you right, creating that index as DPSI will cause only the oldest index partition to be accessed during the rotate, thereby eliminating contention for online app. Is that right ?

I am also planning to do a load replace that oldest partition with dummy file before I do the rotate.

Regards

Kals





Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Attachments

  • image001.png (<1k)
  • image002.png (<1k)
  • image003.png (<1k)

Jean Krasnick

RE: Question on Secondary indexes on partitioned tables and contention
(in response to Jay Reavill)

To Kal (original poster):

Just thought I'd throw my 2 cents' worth on this issue.  I worked on apps that had several different designs for logging tables that were only rarely queried on an ad hoc basis.

One used a table similar to what you've described, with eight parts, one per day of the week and an extra part for special use.  We designed this well before rotating partitions were available.  We designated each part to be assigned to a day of the week:  1 for Sunday, 2 for Monday....7 for Saturday.  The application used a DAYOFWEEK function, I believe, to get the numerical value for the day of the week, and that was assigned to a column and the row was then inserted into the correct part.  We did not have a need to keep an entire week of data in the table, only 3-5 days.  So we used batch jobs at night that determined the day of the week, then conditionally executed steps that unloaded the prior day's data to a flat file (saved for a long time and could be used for ad hoc research) and then did a load replace dummy on the next day's part.  The job ran after midnight, so if the job ran on a Tuesday, for example, it would unload Monday's data and clear out Wednesday's part.  The online application was accessing (writing to) the current day's part, in this example Tuesday's part.

An important part of this design is that we created no indexes on these tables.  There was no need for indexes for inserting rows (no need to check for uniqueness), and we didn't want to impact performance of our heavy inserts by having indexes that would support rare ad hoc queries.  The ad hoc query performance was not a high priority.

Another design we used was to have a separate table for each day of the week...same table name, but different qualifier.  This makes coding the application a little more tricky because it has to supply the correct table qualifier.  We used similar batch jobs to unload the prior day's data and clear out the next day's data.

The above two design examples do not meet your requirements for keeping the data in DB2 for 7 days, but I thought you might want to consider these designs as options.

Yet a third design we used was for apps that needed to retain the logging data for several months (but less than one year), using the concept of "disposable" tables.  We created either monthly or weekly tables; the monthly tables had 5 parts (one per week) (or were non-partitioned if the volume of data was low enough) and the weekly tables had 31 parts (one per day).  The tables actually had the month (or month & week) as part of the table name, and the application had to determine the current month (or month and week) and dynamically create the the table name prior to executing SQL against the table.  These were web apps that used dynamic SQL anyway, so that part was not difficult. 

We then had a series of batch jobs to maintain the tables.  Mid-month, we ran jobs to create the next month's tables (we'd read in template JCL and DDL and write out JCL and DDL with the correct month and/or week, then execute the generated JCL, which executed the generated DDL).  The DDL would drop the table before recreating it (meaning drop last year's January table prior to creating this year's, for example).  We also had jobs that would unload the data and archive it at the end of each month, and we'd run Reorgs on a weekly basis to compress the data (no good way of copying the compression dictionaries since we were starting out fresh each month or week).  We were able to keep the data for 12 months in DB2 (longer in our unloaded archives), so the data was available for research and ad hoc queries; some parts of the application also needed to read prior month's data.  There was no need for any type of purging utility as we simply dropped the tables when they were no longer needed, that's why we considered them "disposable".

In this design, if the online application was going to read the data, we would create necessary index(es), always as DPSI.  If not, we would not create any indexes. Again, no need for indexes for data that is write-only, if there is no need to check uniqueness, and this really improves insert performance. 

I hope this provides some food for thought.

Jean

Kal Sub

RE: Question on Secondary indexes on partitioned tables and contention
(in response to Jay Reavill)

Thanks, Jay and Michael, for your responses. I am still inclined towards that for simplicity. I will see how it goes.

Regards

Kals

In Reply to Jay Reavill:

Kal,

What you are proposing is what we do and it works well for us… Dummy load prior to rotate and use DPSI’s to reduce contention and overhead.

Thanks,
Jay

Jay Reavill
Database Administrator
NACard Solutions
• Office: 727.227.2144
• Cell: 727.215.5794
•: [login to unmask email]<mailto:[login to unmask email]>
FIS | Empowering the Financial World [cid:[login to unmask email] https://www.facebook.com/FIStoday [cid:[login to unmask email] https://twitter.com/FISGlobal [cid:[login to unmask email] https://www.linkedin.com/company/fis

From: Michael Hannan [mailto:[login to unmask email]
Sent: Wednesday, December 13, 2017 11:39 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: Question on Secondary indexes on partitioned tables and contention


Kal,

I have never done a "rotate" so I avoided using that word. I referred to the deletion of rows in one partition. That requires data page locks or row locks or a mass delete lock (for whole table).

Technically indexes do not experience lock waits but rather Latch waits. A DPSI would cause the applications to be latching pages not touched by the partition cleanup process.

Load replace a partition can efficiently make a single partition unavailable briefly, to clean it out, with similarities to a Mass Delete situation. Without a DPSI, an NPI would have many pages to be updated and still experience Index Page Latch Waits. If the application is very quiet with very few transactions at the time, then who cares?

In Reply to Kal Sub:

Thanks for replying, Michael.

We are not on Data Sharing.

The data that will be deleted is the oldest partition which would be 7 days old, and would not be accessed at that time of night.

My concern was, because when I do the rotate, the rows in that oldest partition would be physically deleted, causing the index to be updated, potentially causing some lock waits on the application. If I understand you right, creating that index as DPSI will cause only the oldest index partition to be accessed during the rotate, thereby eliminating contention for online app. Is that right ?

I am also planning to do a load replace that oldest partition with dummy file before I do the rotate.

Regards

Kals





Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

-----End Original Message-----
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

Kal Sub

RE: Question on Secondary indexes on partitioned tables and contention
(in response to Jean Krasnick)

Thanks for that elaborate reply, Jean. We are also doing something like that at present, as this was set up in a much older version of DB2. But as part of redesign, I was hoping to simplify using Table partitioning and eliminate some logic from the application code. The table will be queried adhoc in order to resolve application issues, and I didn't want tablescans on the partition being written.

The point of contention (pardon the pun) was about contention during the rotate.

So the 2 choices are :

1. Create a simple table (with maybe a single index for querying purposes) which will be inserted into by application. Background job runs nightly after midnight to REORG DISCARD the table, discarding the previous day's data, leaving only the data from midnight. Hopefully the switch phase will not time out. ROTATE PARTITION will run on history table to delete oldest day's data (about 6 million records) and the discard dataset will be loaded into the history table. The discard dataset will also be used as a longer term GDG dataset that will be retained for a few months for later querying if needed, just like you do. This option gives us flexibility in how we manage the history table without having to worry about the impact to application. 

2. Create a single partitioned table (with 1 or more indexes)  with  8 partitions. Each night after midnight, application will start writing data to the 8th partition, and a LOAD REPLACE dummy will be done into the oldest partition, followed by a ROTATE PARTITION to change the keys. This option keeps the design to a single table, but does mean application could have contention issues during the ROTATE depending on indexes we create. As Michael said, at that time of night it probably doesn't matter, but I am conscious of reduced flexibility based on past experiences with earlier versions.

Regards

Kals


In Reply to Jean Krasnick:

To Kal (original poster):

Just thought I'd throw my 2 cents' worth on this issue.  I worked on apps that had several different designs for logging tables that were only rarely queried on an ad hoc basis.

One used a table similar to what you've described, with eight parts, one per day of the week and an extra part for special use.  We designed this well before rotating partitions were available.  We designated each part to be assigned to a day of the week:  1 for Sunday, 2 for Monday....7 for Saturday.  The application used a DAYOFWEEK function, I believe, to get the numerical value for the day of the week, and that was assigned to a column and the row was then inserted into the correct part.  We did not have a need to keep an entire week of data in the table, only 3-5 days.  So we used batch jobs at night that determined the day of the week, then conditionally executed steps that unloaded the prior day's data to a flat file (saved for a long time and could be used for ad hoc research) and then did a load replace dummy on the next day's part.  The job ran after midnight, so if the job ran on a Tuesday, for example, it would unload Monday's data and clear out Wednesday's part.  The online application was accessing (writing to) the current day's part, in this example Tuesday's part.

An important part of this design is that we created no indexes on these tables.  There was no need for indexes for inserting rows (no need to check for uniqueness), and we didn't want to impact performance of our heavy inserts by having indexes that would support rare ad hoc queries.  The ad hoc query performance was not a high priority.

Another design we used was to have a separate table for each day of the week...same table name, but different qualifier.  This makes coding the application a little more tricky because it has to supply the correct table qualifier.  We used similar batch jobs to unload the prior day's data and clear out the next day's data.

The above two design examples do not meet your requirements for keeping the data in DB2 for 7 days, but I thought you might want to consider these designs as options.

Yet a third design we used was for apps that needed to retain the logging data for several months (but less than one year), using the concept of "disposable" tables.  We created either monthly or weekly tables; the monthly tables had 5 parts (one per week) (or were non-partitioned if the volume of data was low enough) and the weekly tables had 31 parts (one per day).  The tables actually had the month (or month & week) as part of the table name, and the application had to determine the current month (or month and week) and dynamically create the the table name prior to executing SQL against the table.  These were web apps that used dynamic SQL anyway, so that part was not difficult. 

We then had a series of batch jobs to maintain the tables.  Mid-month, we ran jobs to create the next month's tables (we'd read in template JCL and DDL and write out JCL and DDL with the correct month and/or week, then execute the generated JCL, which executed the generated DDL).  The DDL would drop the table before recreating it (meaning drop last year's January table prior to creating this year's, for example).  We also had jobs that would unload the data and archive it at the end of each month, and we'd run Reorgs on a weekly basis to compress the data (no good way of copying the compression dictionaries since we were starting out fresh each month or week).  We were able to keep the data for 12 months in DB2 (longer in our unloaded archives), so the data was available for research and ad hoc queries; some parts of the application also needed to read prior month's data.  There was no need for any type of purging utility as we simply dropped the tables when they were no longer needed, that's why we considered them "disposable".

In this design, if the online application was going to read the data, we would create necessary index(es), always as DPSI.  If not, we would not create any indexes. Again, no need for indexes for data that is write-only, if there is no need to check uniqueness, and this really improves insert performance. 

I hope this provides some food for thought.

Jean