The Book of Db2 Pacemaker – Chapter 1: Red pill or Blue pill?

Posted By: Alan Lee Technical Content,

WRITTEN BY ALAN LEE

If you have heard about Db2 support of Pacemaker and have currently deployed Db2 in a highly available environment with either integrated Tivoli System Automation & MultiPlatforms (a.k.a TSA) or other user-managed cluster managers, you might be struggling with this question – When is the right time to push my organization to take the red pill to have the new Pacemaker experience or the blue pill to remain status quo with TSA. Fair question and make no mistake, you are not alone. My hope with this opening chapter of what is currently planned as a multi-part series is to provide the reassurance of why taking the red pill now is the way to go.

First and foremost, as I and many of my colleagues have presented in multiple IDUGs since 2020, Db2 has made the strategic decision to replace TSA with Pacemaker as the de facto integrated cluster manager. In other words, it is no longer a matter of IF you should adopt, it is a question of WHEN. We are fully committed to complete this mission and our incremental deliveries over the past 2+ years have hopefully silenced any doubters. Now, the WHEN part hinges on mostly two factors. One is WHEN the specific high availability model (the likes of HADR, Mutual Failover (MF) with shared disk, DPF, and/or pureScale) currently employed by your organization will have Pacemaker support. The second one being your organization upgrade schedule to move to a newer release that provide such support.

As the opening chapter of this series, allow me to walk you through the multitude of reasons behind this move through the following series of Q&A.


Q1: What’s the motivation to abandon decade of investments in TSA?

Answer: Believe it or not, the reason is YOU!!! We listened to many of your requests to bring our trusty HADR technology to public cloud. The fact that there was a burst of requests in early 2019 helped springboard the initial push. We responded by crafting an aggressive development plan. We executed on the plan to explore possibility with TSA first. When that door was shut, we embarked on a journey to find an alternate enterprise-class cloud-ready cluster manager before settling on Pacemaker. We released a closed beta within the same year in Dec 2019, followed by the first GA in June 2020 with Db2 11.5.4.0. 


Q2: What is the rationale of choosing Pacemaker & Corosync?

Answer: Too many good reasons, to touch on a few key ones:

  1. Open source and cloud ready. This one is a no-brainer as supporting cloud deployment is the primary drive for this effort. Being able to accomplish this by embracing open source technology is icing in the cake, enabling our stack modernization on the fly.
  2. Enterprise-ready. This open source stack is not new and has close to two decades of deployment success in critical production environments.
  3. Feature rich, stable, backed by credible contributors. Both Red Hat and SuSE repackaged this into their own respective chargeable High Availability options. In order to keep your total cost of ownership down, we maintain and repackage our own version and bundle with Db2 - free of charge (i.e. no one-time and recurring fees).
  4. “One team to rule them all.” This “team” refers to Db2 support (L2 and L3) teams. Given that we now own the entire software stack, support starts and ends with Db2. Time spent in transferring cases back and forth between Db2 and TSA will no longer be necessary.
  5. Simplified cluster software architecture. If it used to take 15 minutes to provide an overview of TSA/RSCT architecture, it only takes 5 minutes to accomplish this for Pacemaker/Corosync. That’s speaks volume of its leaner architecture and simplicity. In addition, its close resemblance of architecture breakdown with TSA ó Pacemaker and RSCT ó Corosync greatly reduced our development cost compared to other alternatives.
  6. Performance boost. Even with default settings (no extra tuning effort), Pacemaker provides a ~30% improvement from the problem detection to workload resumption compared to the same test with TSA.

Q3: Is the Pacemaker solution a fit for all HA use cases?

Answer: There are multiple layers of consideration for this. Let’s start with the simple one.

Your target deployment platform and architecture

As of Db2 V11.5.8.0, the Db2 Integrated Pacemaker solution can be deployed on Red Hat from RHEL 8.x onwards and SuSE Linux from SLES 15 SP1 onwards on Intel x86, POWER Linux, and Linux on IBM Z architectures for on-premises as well as on public cloud. Refer to the prerequisite page https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-prerequisites-integrated-solution-using for detail.

Your desired High Availability model

Starting with 11.5.5.0, the Db2 Integrated Pacemaker solution is available for HADR with up to three standbys in total (including two as auxiliary standby). In 11.5.8.0, another popular HA model, Mutual Failover with shared disk, was added to the support list. Once again, both configurations can be deployed on-premises and cloud, provided all prerequisites are met. Data Partition Feature (DPF) and pureScale are not yet available as of this writing. Rest assured, they are both under development and are prioritized for the next major release.

Your need for Db2 Integrated vs Db2 Non-Integrated cluster manager

Perhaps, this is where confusion may have arisen in the past. Let’s first define what “Integrated” means. In Db2 terms, a “Db2 Integrated cluster manager” means users are not required to perform any specific cluster manager actions/commands to orchestrate recovery in forms of takeover/failover/local restart from failure of resources under surveillance by the cluster manager. The end-to-end process from the point of failure detection to full recovery is completely automated. Prior to Pacemaker, TSA has been the ONLY cluster manager that is fully integrated with Db2 in all High Availability model (HADR, MF, DPF, and pureScale). 

Of course, TSA can also be configured as Non-Integrated cluster manager. It’s often seen in Mutual Failover with shared disk setup where users specifically want the action of failover to standby host to be manual. In this setup, customers typically configure their own monitoring of resources to notify DBAs on errors. DBA, once alerted of the situation, will have to react to it by performing the necessary repair on the local host or manually trigger the failover to the standby. PowerHA, Veritas Cluster Services (VCS), Microsoft Cluster Service (MSCS) are other widely adopted cluster managers that are not integrated with Db2. In these scenarios, Db2 provides support only at the Db2 level, support at cluster manager level is the responsibility of the cluster service provider at users’ expense.

With the above definition in mind, anyone deployed with Db2 Integrated cluster manager solution (with TSA) can move to the new Integrated Pacemaker solution relatively easily (provided all prerequisites are met). For those deployed with Db2 Non-Integrated cluster manager solution, and you wish to switch to an automated failure detection and recovery orchestration solution or move to cloud, the first step is to conduct a careful study of what and how resources are currently monitored in your current environment, then compare with what’s available with the Integrated Pacemaker solution. Some adjustments in terms of resource monitored and actions on failure may be necessary to fully adopt the new solution.

Your modifications and/or deviations from standard resource model

Typically, if you are already using TSA as an integrated solution today, the cluster is setup by our integrated cluster manager utility - db2haicu. As such, there should have very few, if any, modifications to the resource model. Overall, the fewer additions and/or deviations from the base resource model setup automatically with db2haicu, the easier it is to adopt the new Db2 Integrated Pacemaker solution.


Q4: Can the solution with Pacemaker deploy on all public clouds?

Absolutely. Recall the new Db2 Integrated Pacemaker solution is targeted for both on-premises and cloud deployment as long as our prerequisites are met. Where this question may have come up in the past is currently, only two public cloud vendors, namely AWS and Azure, are referenced in “Public cloud vendors supported with Db2 Pacemaker” page of our documentation https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-public-cloud-vendors-supported-db2. To help clarify this, our solution can be deployed on major public cloud vendors using the published setup instructions via db2cm. On cloud, our goal is to leverage as many vendor specific technologies as possible to optimize cost, improve functionality, and enhance the up & running experience. This has led to our own exploration of specific cloud vendors features and integrate them into our utility. Using vendor specific virtual IP component and fencing technology are prime example of this. As of Db2 11.5.8.0, we have completed the effort on AWS and Azure. Others such as IBM Cloud and GCP will be tackled in future releases. For now, users may need to use alternate mechanisms such as ACR instead of VIP and introduce a 3rd host for quorum instead of using cloud fencing.


Q5: What are the supported platforms, prerequisites, and restrictions with Pacemaker?

Q6: Where do we get started with Pacemaker?

https://www.ibm.com/docs/en/db2/11.5?topic=pacemaker-configuring-high-availability-db2cm


Q6: What is the roadmap?

We are looking to add the following support in future releases:

  1. DPF
  2. pureScale
  3. Mount monitoring support in HADR
  4. Non-root user db2cm support
  5. Customized config on GCP & IBM Cloud
  6. AIX support
  7. Containerization

In closing, take the red pill if what’s available in Pacemaker today suits your criteria. The days of TSA in Db2 is numbered. 

Next up in this series is The Book of Db2 Pacemaker – Chapter 2: Pacemaker Cluster … Assemble! Followed by topical studies and technical deep dive on resource model, maintenance/health check, quorum, cloud specials, and many more. If you have a specific topic that you would like to see, feel free to reach out to me.

Once again, take the red pill now - “This is the way and I have spoken!”.


Alan Lee joined IBM in 1999 and spent his first two years as the UNIX development infrastructure team lead focusing on improving overall development efficiency. He has been with the Db2 kernel development team since 2001. His early contributions spanned across various layers in Db2 kernel - operating system services, file system features exploitations, pureScale infrastructure, object storage support in AWS and IBM Cloud. His latest achievement included bringing pureScale to AWS Marketplace, reducing deployment from weeks/months to minutes. Development roles aside, he led the Db2 LUW Support organization in all geos, significantly improved its support Net Promoter Score (NPS) by double digit within a year. Currently, he is the product owner of Db2 pureScale & high availability and upline development manager of Security and Db2/SAP partnership.