looking for best practice running db2 HADR on AWS multi-AZ deployment

Rui Chen

looking for best practice running db2 HADR on AWS multi-AZ deployment

Hi DB2-experts,

While having fun playing with HADR on AWS, we bumped into bunch of interesting problems. We searched and tried around for a while, but found limited publicly available resources to refer to. Hope you don't mind sharing some insights/suggestions with us covering the following aspects:

1. recommendations on using virtual ip or not:

    1.1 on AWS we realize IBM.ServiceIP may not be able to manage vip, for two reasons:
        1.1.1 a single subnet can't span across multiple-AZ;
        1.1.2 TSA doesn't check if vip is really assigned to the host.

    1.2 IBM.Application may be able to manage vip failover in AWS multi-AZ deployment through customized scripting, but the failover operation by itself bounds the RTO and could be SPOF by itself. Alternatively, we are trying to cover as many caveats as possible, if we don't use vip. Anything else we should watch out for, besides split-brain and client re-route complexity? We are aware vip could help to avoid split-brain, and reduce client re-route complexity, but the same goals could be accomplished through careful orchestration without vip. Btw, we may still want to use ROS (yes, we are warned about ROS limitations, but since we are paying for it anyway, can't resist the temptation using it.... ), so ACR/ClientAffinity is probably not the final solution.

2. recommendations on log file management, assuming $ cheap and performant shared storage solution is not available in multi-AZ AWS deployment,:

    2.1 ARCHIVE LOG protection:

        What's the best practice making archive logs always available to all HADR nodes, which should be able to survive single AZ-wise EBS failure? I can think of mounting logarchmeth1/2 to EFS or a third party solution, or even using fs/block storage replication (eg. DRBD), but that's quite a big change for us.... We could setup customized log-shipping, but that still doesn't protect archive logs that are not shipped yet..... In our current setup, archive logs may be required by QCapture program, or adding extra Standby. 

    2.2 ACTIVE LOG Protection:

        MIRRORLOGPATH is an option, but we probably won't go for it due to performance penalty. We understand active log entries should be available on Principle Standby in PEER state anyway, but just in case HADR fails out of PEER state... 

3. Best practice references on multi-AZ HADR deployment in general.... 

Really appreciate your inputs!

 

Bests,

Rui

 

btw, we use V10.5FP8 on linux.

Edited By:
Rui Chen[Organization Members] @ Jan 22, 2018 - 04:27 PM (America/Eastern)

Philip Gunning

looking for best practice running db2 HADR on AWS multi-AZ deployment
(in response to Rui Chen)
I think you will find a lot of info on the wiki at

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/Welcome. You can also search for presentations on this from Dale McCinnis and others. HADR is pretty straightforward. You’ll need to choose ACR or VIP, not both. TSAMP is fairly complex and the best thing to do with it is experiment with it. There is a good white paper on how to set up, although it is old. Good luck.
> IBM Champion for Analytics
>
> Certified Information Systems Security Professional(CISSP)
>
> Certification Number 539059
>
> Certified Advanced DB2 DBA v10.5
>
> Certified Database Adminstrator, DB2 11.1
>
> IBM DB2 LUW Support Page -- https://www.ibm.com/analytics/us/en/technology/db2/db2-linux-unix-windows.html
>
> Skype: DB2LUW
>
> Twitter: DB2LUW
>
> Direct +1.610.451.5801
>
> IDUG DB2-L Hall of Fame
>
> www.philipkgunning.com
>
> IBM Business Partner
>
>
>
>
Sent from my iPhone

> On Jan 22, 2018, at 4:12 PM, Rui Chen <[login to unmask email]> wrote:
>
> Hi DB2-experts,
>
> While having fun playing with HADR on AWS, we bumped into bunch of interesting problems. We searched and tried around for a while, but found limited publicly available resources to refer to. Hope you don't mind sharing some insights/suggestions with us covering the following aspects:
>
> 1. recommendations on using virtual ip or not:
>
> 1.1 on AWS we realize IBM.ServiceIP may not be able to manage vip, for two reasons:
> 1.1.1 a single subnet can't span across multiple-AZ;
> 1.1.2 TSA doesn't check if vip is really assigned to the host.
>
> 1.2 IBM.Application may be able to manage vip failover in AWS multi-AZ deployment through customized scripting, but the failover operation by itself bounds the RTO and could be SPOF by itself. Alternatively, we are trying to cover as many caveats as possible, if we don't use vip. Anything else we should watch out for, besides split-brain and client re-route complexity? We are aware vip could help to avoid split-brain, and reduce client re-route complexity, but the same goals could be accomplished through careful orchestration without vip. Btw, we may still want to use ROS (yes, we are warned about ROS limitations, but since we are paying for it anyway, can't resist the temptation using it.... ), so ACR/ClientAffinity is probably not the final solution.
>
> 2. recommendations on log file management, assuming $ cheap and performant shared storage solution is not available in multi-AZ AWS deployment,:
>
> 2.1 ARCHIVE LOG protection:
>
> What's the best practice making archive logs always available to all HADR nodes, which should be able to survive single AZ-wise EBS failure? I can think of mounting logarchmeth1/2 to EFS or a third party solution, or even using fs/block storage replication (eg. DRBD), but that's quite a big change for us.... We could setup customized log-shipping, but that still doesn't protect archive logs that are not shipped yet..... In our current setup, archive logs may be required by QCapture program, or adding extra Standby.
>
> 2.2 ACTIVE LOG Protection:
>
> MIRRORLOGPATH is an option, but we probably won't go for it due to performance penalty. We understand active log entries should be available on Principle Standby in PEER state anyway, but just in case HADR fails out of PEER state...
>
> 3. Best practice references on multi-AZ HADR deployment in general....
>
> Really appreciate your inputs!
>
>
>
> Bests,
>
> Rui
>
>
> Site Links: View post online View mailing list online Start new thread via email Unsubscribe from this mailing list Manage your subscription
>
> This email has been sent to: [login to unmask email]
> Setup a data refresh task in less time than it takes to make a cup of coffee + save up to 90% in CPU
> ESAi's BCV5 & XDM fast data refresh & Test Data Mgmt products will make you a hero to users. See
> http://www.ESAIGroup.com/idug
>
> Use of this email content is governed by the terms of service at:
> http://www.idug.org/p/cm/ld/fid=2
>

Philip Gunning

looking for best practice running db2 HADR on AWS multi-AZ deployment
(in response to Rui Chen)
I think you will find a lot of info on the wiki at

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/DB2HADR/page/Welcome. You can also search for presentations on this from Dale McCinnis and others. HADR is pretty straightforward. You’ll need to choose ACR or VIP, not both. TSAMP is fairly complex and the best thing to do with it is experiment with it. There is a good white paper on how to set up, although it is old. Good luck.
> IBM Champion for Analytics
>
> Certified Information Systems Security Professional(CISSP)
>
> Certification Number 539059
>
> Certified Advanced DB2 DBA v10.5
>
> Certified Database Adminstrator, DB2 11.1
>
> IBM DB2 LUW Support Page -- https://www.ibm.com/analytics/us/en/technology/db2/db2-linux-unix-windows.html
>
> Skype: DB2LUW
>
> Twitter: DB2LUW
>
> Direct +1.610.451.5801
>
> IDUG DB2-L Hall of Fame
>
> www.philipkgunning.com
>
> IBM Business Partner
>
>
>
>
Sent from my iPhone

> On Jan 22, 2018, at 4:12 PM, Rui Chen <[login to unmask email]> wrote:
>
> Hi DB2-experts,
>
> While having fun playing with HADR on AWS, we bumped into bunch of interesting problems. We searched and tried around for a while, but found limited publicly available resources to refer to. Hope you don't mind sharing some insights/suggestions with us covering the following aspects:
>
> 1. recommendations on using virtual ip or not:
>
> 1.1 on AWS we realize IBM.ServiceIP may not be able to manage vip, for two reasons:
> 1.1.1 a single subnet can't span across multiple-AZ;
> 1.1.2 TSA doesn't check if vip is really assigned to the host.
>
> 1.2 IBM.Application may be able to manage vip failover in AWS multi-AZ deployment through customized scripting, but the failover operation by itself bounds the RTO and could be SPOF by itself. Alternatively, we are trying to cover as many caveats as possible, if we don't use vip. Anything else we should watch out for, besides split-brain and client re-route complexity? We are aware vip could help to avoid split-brain, and reduce client re-route complexity, but the same goals could be accomplished through careful orchestration without vip. Btw, we may still want to use ROS (yes, we are warned about ROS limitations, but since we are paying for it anyway, can't resist the temptation using it.... ), so ACR/ClientAffinity is probably not the final solution.
>
> 2. recommendations on log file management, assuming $ cheap and performant shared storage solution is not available in multi-AZ AWS deployment,:
>
> 2.1 ARCHIVE LOG protection:
>
> What's the best practice making archive logs always available to all HADR nodes, which should be able to survive single AZ-wise EBS failure? I can think of mounting logarchmeth1/2 to EFS or a third party solution, or even using fs/block storage replication (eg. DRBD), but that's quite a big change for us.... We could setup customized log-shipping, but that still doesn't protect archive logs that are not shipped yet..... In our current setup, archive logs may be required by QCapture program, or adding extra Standby.
>
> 2.2 ACTIVE LOG Protection:
>
> MIRRORLOGPATH is an option, but we probably won't go for it due to performance penalty. We understand active log entries should be available on Principle Standby in PEER state anyway, but just in case HADR fails out of PEER state...
>
> 3. Best practice references on multi-AZ HADR deployment in general....
>
> Really appreciate your inputs!
>
>
>
> Bests,
>
> Rui
>
>
> Site Links: View post online View mailing list online Start new thread via email Unsubscribe from this mailing list Manage your subscription
>
> This email has been sent to: [login to unmask email]
> Setup a data refresh task in less time than it takes to make a cup of coffee + save up to 90% in CPU
> ESAi's BCV5 & XDM fast data refresh & Test Data Mgmt products will make you a hero to users. See
> http://www.ESAIGroup.com/idug
>
> Use of this email content is governed by the terms of service at:
> http://www.idug.org/p/cm/ld/fid=2
>

Rui Chen

RE: looking for best practice running db2 HADR on AWS multi-AZ deployment
(in response to Philip Gunning)

Hi Philip,

Thanks a lot for your input. No doubt the resources you mentioned are amazing and helped us a lot in our HADR evaluation, plus some other redbooks/white papers/online forums. But unfortunately we haven't found so far much reference on setting up HADR (+QRepl, or any other replication tool, since ROS wouldn't satisfy our requirements due to the long list of limitations) in AWS multi-AZ deployment.

We bumped into some critical caveats (mentioned in the original post) and just trying to see if our db2 community has already been through them. 

Anyway, thanks again Philip, and we are more than happy to provide more feedback to db2 community about our findings.

 

Bests,

Rui