Let me introduce you to the infrastructure of a project I've been working on recently:
- My development team is scattered over three continents
- My project e-mail is at Gmail
- Our project documents are at Dropbox
- Our source code repositories are at Bitbucket
- Our infrastructure management is done via RightScale
- Our application servers are at Amazon Web Services (AWS)
- Mostly importantly, our DB2 database servers are also at AWS
The common theme running through this entire setup is what has become known as “the cloud.” A couple of years ago this was the buzzword of the moment and every piece of technology had to be “cloud ready” or “cloud enabled” to be worthy of consideration by technology purchasers. But, what is really meant by “the cloud?”
Perhaps a more meaningful description for “the cloud” is “utility computing.” As far back as the 1960s, computer science academics predicted that one day computing power would be available to the general public in a similar business model to household utilities such as electricity and gas: in other words you paid for what you used and left others to worry about the production and delivery of the service. The spread of the Internet made this prediction a reality, starting with common facilities with relatively simply protocols such as email and gradually offering more and more of the services previously only available to those who had the financial resources to have their own data centres or the skills to manage their own servers.
Cloud services are normally described at three levels –
- Software as a Service (SaaS): a cloud provider makes a service available to a customer, who interacts with this service through a client, normally browser-based. Gmail and other webmail systems are a simple example. More complex offerings are now available, going as far as complete ERP systems such as Salesforce.com.
- Platform as a Service (PaaS): a cloud provider makes basic infrastructure building blocks available to a user and manages the hardware and software maintenance of these building blocks. The building blocks can be at a fairly low level, such as databases or application servers, or at a slightly higher level, such as common application components (the most well known offering in this latter category is Google Apps).
- Infrastructure as a Service (IaaS): this is the lowest level of service provision, with the provider making available an execution environment of virtualized servers which can then be configured to the customer’s needs. The consumer is still involved in the build of the software stack to some degree, but does not get involved in the provision or maintenance of hardware.
These three general levels do not have hard and fast boundaries. For example, some DB2 facilities fall between PaaS and IaaS with prebuilt DB2 servers (PaaS) available as images which can be deployed onto various IaaS providers' infrastructures.
IBM started providing DB2 for various cloud environments at the height of the “cloud buzz.” Initially it provided prebuilt DB2 servers as images which could be deployed on the Amazon Web Services (AWS) cloud. These were often combined with application delivery platforms such as Ruby on Rails (RoR) or Websphere Application Server Community Edition (WAS-CE) to provide a ready built environment for standard Web applications with DB2 as the back-end database server. This made it easy for developers to gain experience of DB2 without having to go through the pain of acquiring, building and configuring the hardware and system software.
One of the issues with the prebuilt images is that once you start to use them you then have to maintain them. Operating systems and DB2 both need upgrading. You could either upgrade the image in situ or acquire a newer image and move the database over onto this. Neither of these were ideal solutions.
The next stage in the evolution of DB2 support was to separate the environment into three parts –
- System software installation, including the operating system and DB2 software, which could be replaced without affecting the underlying business data
- DB2 instance, and its related home directories, which needed to be retained and would need some maintenance activity if the related system software changed
- Business data which needed to be preserved regardless of what happened to the system software or DB2 instance.
At the time this was being looked at by IBM, two advancements in cloud provision were taking place –
- Storage areas which could be retained permanently rather than existing only for the lifetime of a particular server. This storage could be detached from one server, either while the server was still active or when a server was shut down, and attached and configured on another server. The best known example of this type of storage is Amazon's EBS (Elastic Block Storage). This allowed IBM to separate and preserve the DB2 instance and business data when rebuilding the underlying system software.
- Cloud management facilities started to emerge. These shifted the pattern for maintaining cloud-based services from using snapshot images to using recipes for defining the components to be built, combined with tools to actually build the servers using the instructions in these recipes. Open source projects developed Domain Specific Languages (DSLs) for describing configurations: the two most common of these are Puppet and Chef. Commercial companies built around these technologies or implemented similar concepts with proprietary developments, making the technology accessible to a wider user base. They also added seamless interfaces to a range of cloud infrastructure providers. IBM partnered with RightScale to deliver build specifications for DB2.
Having first experimented with the DB2 AWS images, we soon realised the limitations of this approach. After discussing these issues with IBM's Leon Katsnelson at an IDUG DB2 Conference (always a good place to get help), we investigated the DB2 RightScale resources.
The initial deliverable from IBM was a ServerTemplate for building a DB2 Express-C server onto virtualized Linux servers. Initially this DB2 (Version 10.1) was only deployable onto two cloud infrastructure providers: AWS and Rackspace. Within the last few weeks (September 2013) IBM has released a new template for building DB2 10.5 Fixpack 1 and this can be deployed onto six different cloud infrastructure providers.
To build a DB2 server on one of the supported clouds, you import the IBM-supplied definition (ServerTemplate) into a deployment specification, provide values for a number of configuration variables and then request the deployment to take place. There are a small set of required configuration parameters. These include the cloud infrastructure provider to use, the names and sizes of the disk volumes for the DB2 instance and data and the instance owner credentials. There are also many optional parameters. The IBM ServerTemplate package contains many useful features such as the ability to add additional users and to set up backups to a range of cloud storage / archiving facilities. The new 10.5 ServerTemplate has even more facilities and has been changed from using basic UNIX shell scripts to using Chef: this makes it easier to extend the IBM templates including better handling of dependencies and deployment sequence.
Of course a database is not much use without applications. It is generally good practice to not run applications on the database server but on another instance, both in terms of performance and ease of maintenance. Lacking from IBM's initial provisions was automation for building a DB2 client installation. While Java applications can be deployed with the 2 JAR files required to connect to DB2, other commonly used application execution environments such as Ruby on Rails and PHP (Zend Server) require a local DB2 client install. In July 2013 IBM at last produced a RightScript (an individual configuration component which can be mixed in with an application server deployment) for installing and configuring the IBM Data Server client. We were able to combine this with a Zend Server installation RightScript, which included prebuild DB2 drivers for PHP, to deploy an application server suitable for running our PHP application code.
It is important to make some remarks about managing DB2 on the cloud. While we have found AWS to be very reliable, there are well publicized cases of outages at cloud infrastructure providers.
In a standard DB2 deployment if you require any degree of disaster recovery capability you would always ensure that regular backups are taken and that (copies of) these backups are stored in a location other than where the server resides. It is no different when you are running DB2 in the cloud: you still need to take regular backups and store these backups somewhere other than on the (virtual) DB2 server. You also should take the same care of your archive logs as you would in a traditional environment. Within the IBM-supplied ServerTemplates there are facilities to set up and store backups in cloud archive storage solutions such as Amazon S3.
Similar considerations apply if you intend to provide High Availability or Disaster Recovery for your cloud-based DB2 servers using HADR. Just as you wouldn't put both sides of an HADR cluster in the same rack in your server room you should ensure that your cloud-based HADR cluster is geographically separated. Most cloud infrastructure providers allow you to choose deployment to different locations. If your cloud provider has only one location then it would be a good idea to deploy your standby server with another cloud provider (ensuring of course that they don't actually share the same data centre). A word of explanation about AWS terminology is appropriate here to ensure you understand what you may be getting. AWD offers different “availability zones” in multiple “EC2 regions.” “Availability zones” are physically separate data centres in the same geographical area (perhaps only separated by little more than a fireproof wall). “EC2 regions” are geographically separate. In general if you are using HADR for (local) HA then you would place the two servers in different availability zones within the same region. However if you are using HADR to provide disaster recovery then you would want your two servers to be in different regions.
There are a number of reasons why you might consider running DB2 in the cloud. The most common of these are –
- Flexible availability of resources without large capital expenditure on servers. If extra capacity is needed this can be readily obtained. This capacity comes in two forms: more DB2 instances (perhaps to support more developers or parallel testing) or more processing power in a single DB2 instance (perhaps to support production workloads at peak periods)
- Reducing costs, particularly staff costs, by “getting out of the hardware business.” The money saved can be directed towards building your business.
There are a number of concerns about cloud computing, particularly with regards to security and lack of control. The cloud infrastructure providers have spent a lot of time on these concerns, and the security infrastructure they have in place should be at least as good (and probably better in most cases) as the security of servers in most private, but Internet connected, data centres. Of course due diligence needs to take place when choosing a cloud infrastructure provider to ensure they meet your needs. One particular area of concern in some business sectors is data placement. Where the data is physically stored determines who can legitimately gain access to that data (e.g. for law enforcement purposes). Also some businesses are required by law to store their data in the same geographical area as they carry out their business. This is particularly true in Europe, where EC data protection legislation is strictly enforced. To address these requirements / concerns most of the major cloud infrastructure providers have data centres situated in key geographies around the world.
For those who still have concerns over using the public cloud, many of the principles and technologies are now available from a variety of vendors (including IBM) to create private cloud infrastructures. These bring many, but not all, of the benefits of the public cloud: however key benefits such as not needing to manage your own hardware are not realised.
As IBM continues to improve on its cloud offerings for DB2, IDUG will be at the forefront of assessing and providing technical guidance on their use. Through our involvement in the DB2 Customer Advisory Council (CAC) we are also able to provide input on key customer requirements in this area. The provision by IBM of a DB2 client deployment RightScript was as a direct result of IDUG’s involvement in the DB2 CAC. We look forward to carrying out a more detailed analysis of the new DB2 10.5 ServerTemplate soon: watch this space.
Until then … don't just do it, DB2 it !!!