DB2 for z/OS BMC Image Copy

Balasubramaniyan Rengan

DB2 for z/OS BMC Image Copy

Hi List,

We are running DB2 for z/OS V9 in CM with BMC Utilities. We recently had an incident where a BMC Image copy was run on one of the busiest production tablespaces during business hours without specifying any SHRLEVEL option. This took the default option for SHRLEVEL as REFERENCE and issued a STOP at UTILTERM phase. As there was a long running batch job holding a MDEL lock on that tablespace, the STOP couldn't succeed and the tablespace went to STOPP status. The TSO ids which issued a DISPLAY on that tablespace via BMC Catalog Manager hanged and we couldn't cancel them. Finally, we looked the status through Omegamon, cancelled the long running job so the table went to STOP and started manually.

This incident prompted many questions.

1) Why would BMC have a default option for SHRLEVEL as REFERENCE when we have CHANGE? If I would like a consistent copy, I will code the SHELEVEL explicitly anyway. It would be great to have this default changed to SHRLEVEL CHANGE unless there is a strong reason to have REFERENCE as a default?!

2) Why Catalog Manager hanged and refused to show the status of the object?

3) The object was in STOPP status for almost around an hour. I read in the command reference that if the STOP cannot get the drain locks on the first request, it repeatedly tries again and the command fails if it times out more than 15 times trying to get the locks. I wonder why this did not occur here and how long the STOP will be trying to get the locks per try?

4) Would cancelling the thread of the STOP command (020.STOPDB09) help in similar situation rather than cancelling the batch job which was holding the STOP?

Thanks in advance.

Regards

Bala

 

 

 

Bill Pothoff

RE: DB2 for z/OS BMC Image Copy
(in response to Balasubramaniyan Rengan)

Hi Bala,

I'll throw out a comment for #1 - We use SHRLEVEL REFERENCE as the default to be consistent with IBM's copy syntax.  Consistency is good, no? 

We do have to be careful about changing defaults, because we don't want to affect the function of existing JCL.  Customers tend to get upset, and rightfully so, if we make changes that cause them to have to review and update their existing jobs to keep them doing what they had been doing prior to the change.

Bill Pothoff

BMC Software Technical Support - DB2 Backup & Recovery


In Reply to Balasubramaniyan Rengan:

Hi List,

We are running DB2 for z/OS V9 in CM with BMC Utilities. We recently had an incident where a BMC Image copy was run on one of the busiest production tablespaces during business hours without specifying any SHRLEVEL option. This took the default option for SHRLEVEL as REFERENCE and issued a STOP at UTILTERM phase. As there was a long running batch job holding a MDEL lock on that tablespace, the STOP couldn't succeed and the tablespace went to STOPP status. The TSO ids which issued a DISPLAY on that tablespace via BMC Catalog Manager hanged and we couldn't cancel them. Finally, we looked the status through Omegamon, cancelled the long running job so the table went to STOP and started manually.

This incident prompted many questions.

1) Why would BMC have a default option for SHRLEVEL as REFERENCE when we have CHANGE? If I would like a consistent copy, I will code the SHELEVEL explicitly anyway. It would be great to have this default changed to SHRLEVEL CHANGE unless there is a strong reason to have REFERENCE as a default?!

2) Why Catalog Manager hanged and refused to show the status of the object?

3) The object was in STOPP status for almost around an hour. I read in the command reference that if the STOP cannot get the drain locks on the first request, it repeatedly tries again and the command fails if it times out more than 15 times trying to get the locks. I wonder why this did not occur here and how long the STOP will be trying to get the locks per try?

4) Would cancelling the thread of the STOP command (020.STOPDB09) help in similar situation rather than cancelling the batch job which was holding the STOP?

Thanks in advance.

Regards

Bala

 

 

 

Balasubramaniyan Rengan

RE: DB2 for z/OS BMC Image Copy
(in response to Bill Pothoff)

Hi Bill,

Thanks for your reply. Yes, consistency is good :), but, in this case, it resulted in an unplanned outage. We are trying to find ways to minimize the risk of similar situation happening again and one of the ways we think is to have less restrictive option(s) as defaults. We also want to know if defaults can be made vendor specific (on request).

Regards

Bala

Adam Baldwin

RE: DB2 for z/OS BMC Image Copy
(in response to Balasubramaniyan Rengan)

Hi Bala. A couple of points. Copies taken with shrlevel reference help to optimize recovery processing as well as improving the performance of tha actual copy job. Reference is a good default and the user has the ability to over ride this as and when required. You say that "consistency is good" but that "it resulted in an unplanned outage". Surely the outage resulted from the fact that the copy job was run without coding the necessary options. I don't want to step on any ISV's toes, but product defaults are, well, defaults. Where there are options with a default value, the user can tailor and specify the values as required. To enforce shop specific parameters one can always use procs etc.

IMHO, shrlevel reference is the correct default to have for an image copy job.

Cheers, Adam

Mike Vaughan

DB2 for z/OS BMC Image Copy
(in response to Balasubramaniyan Rengan)
A couple other folks have commented on the SHRLEVEL default, but I didn't want to overlook the STOP that was issued, which I believe was the larger issue here. The STOP shouldn't be needed just because the copy is running shrlevel reference, but I believe BMC Copy issues a STOP to reset modified page bits. If this was the reason for the STOP then you should be able to eliminate it by including "RESETMOD NO" on your copy statement.

STOPP is a fairly nasty status and it's possible to get into situations like you mentioned where a DISPLAY just hangs waiting behind the STOP. I believe V9 was the first release that did allow the STOP to be cancelled, so that should have been an option to get out of it.

From: Balasubramaniyan Rengan [mailto:[login to unmask email]
Sent: Friday, August 03, 2012 12:36 PM
To: [login to unmask email]
Subject: [DB2-L] - DB2 for z/OS BMC Image Copy


Hi List,

We are running DB2 for z/OS V9 in CM with BMC Utilities. We recently had an incident where a BMC Image copy was run on one of the busiest production tablespaces during business hours without specifying any SHRLEVEL option. This took the default option for SHRLEVEL as REFERENCE and issued a STOP at UTILTERM phase. As there was a long running batch job holding a MDEL lock on that tablespace, the STOP couldn't succeed and the tablespace went to STOPP status. The TSO ids which issued a DISPLAY on that tablespace via BMC Catalog Manager hanged and we couldn't cancel them. Finally, we looked the status through Omegamon, cancelled the long running job so the table went to STOP and started manually.

This incident prompted many questions.

1) Why would BMC have a default option for SHRLEVEL as REFERENCE when we have CHANGE? If I would like a consistent copy, I will code the SHELEVEL explicitly anyway. It would be great to have this default changed to SHRLEVEL CHANGE unless there is a strong reason to have REFERENCE as a default?!

2) Why Catalog Manager hanged and refused to show the status of the object?

3) The object was in STOPP status for almost around an hour. I read in the command reference that if the STOP cannot get the drain locks on the first request, it repeatedly tries again and the command fails if it times out more than 15 times trying to get the locks. I wonder why this did not occur here and how long the STOP will be trying to get the locks per try?

4) Would cancelling the thread of the STOP command (020.STOPDB09) help in similar situation rather than cancelling the batch job which was holding the STOP?

Thanks in advance.

Regards

Bala



-----End Original Message-----
-----Message Disclaimer-----

This e-mail message is intended only for the use of the individual or
entity to which it is addressed, and may contain information that is
privileged, confidential and exempt from disclosure under applicable law.
If you are not the intended recipient, any dissemination, distribution or
copying of this communication is strictly prohibited. If you have
received this communication in error, please notify us immediately by
reply email to [login to unmask email] and delete or destroy all copies of
the original message and attachments thereto. Email sent to or from the
Principal Financial Group or any of its member companies may be retained
as required by law or regulation.

Nothing in this message is intended to constitute an Electronic signature
for purposes of the Uniform Electronic Transactions Act (UETA) or the
Electronic Signatures in Global and National Commerce Act ("E-Sign")
unless a specific statement to the contrary is included in this message.

While this communication may be used to promote or market a transaction
or an idea that is discussed in the publication, it is intended to provide
general information about the subject matter covered and is provided with
the understanding that The Principal is not rendering legal, accounting,
or tax advice. It is not a marketed opinion and may not be used to avoid
penalties under the Internal Revenue Code. You should consult with
appropriate counsel or other advisors on all matters pertaining to legal,
tax, or accounting obligations and requirements

Balasubramaniyan Rengan

RE: DB2 for z/OS BMC Image Copy
(in response to Mike Vaughan)

Hi All,

Thanks for your valuable comments on this topic. I thought of concluding this post with the findings.

Hi Adam, Yes. We respect the deaults and we have updated documents to code the SHRLEVEL parameter explicitely.

Hi Mike, Thanks for your suggestions. The RESETMOD option is set to YES at the moment and we plan to make it no so that we can avoid the STOP. And, cancelling the thread of the STOP helps to get out of STOPP.

Thanks to Bill for helping me to find answers for the queries and for suggesting better utility options.

Regards

Bala 

Tsui Yuk Kai

DB2 image copy
(in response to Balasubramaniyan Rengan)
>
> Hi all,


Is there any one know why elapse time use in DB2 image copy to disk almost
2 times with PPRC versus no PPRC. is it caused by the latency or other
factors need to consider ?
Any thing need to tune? We use HDS disk. Many thanks

Roy Boxwell

DB2 image copy
(in response to Tsui Yuk Kai)
Last time I saw this it was a “problem” with the disk setup and release parameters (asynch or synch?). I cannot remember the details, but basically the first copy is waiting for the 2nd copy to complete before continuing. If you do not worry about the chance of two bad copies then you can change the parms so it returns when 1st copy is done.

Roy Boxwell
SOFTWARE ENGINEERING GmbH and SEGUS Inc.
-Product Development-
Heinrichstrasse 83-85
40239 Düsseldorf/Germany
Tel. +49 (0)211 96149-675
Fax +49 (0)211 96149-32
Email: [login to unmask email]<mailto:[login to unmask email]>
http://www.seg.de

Software Engineering GmbH
Amtsgericht Düsseldorf, HRB 37894
Geschäftsführung: Gerhard Schubert

On 13 Feb 2018, at 22:44, Tommy Tsui <[login to unmask email]<mailto:[login to unmask email]>> wrote:

Hi all,

Is there any one know why elapse time use in DB2 image copy to disk almost 2 times with PPRC versus no PPRC. is it caused by the latency or other factors need to consider ?
Any thing need to tune? We use HDS disk. Many thanks

-----End Original Message-----

Avram Friedman

RE: DB2 image copy
(in response to Tsui Yuk Kai)

PPRC is a hardware solution which provides rapid and accurate disaster recovery as well as a solution to workload movement and device migration. Updates made on the primary DASD volumes are synchronously shadowed to the secondary DASD volumes. The local storage subsystem and the remote storage subsystem are connected through a communications link called a PPRC path. 

Notice the word synchronously
The IO is not complete until the copy is complete

 

Avram Friedman
DB2-L hall of fame contributer
DB2-L acting administrator

[login to unmask email]