DevOps is one of the current favourite buzzwords among IT strategists. Like many of these buzzwords, there have been many definitions. And at the end of the day, much of what it describes turns out to be similar to procedures that many of us in Enterprise IT have been following for many years using different terminology. For clarity therefore, let us start with a definition (taken from Wikipedia): “DevOps is a culture, movement or practice that emphasizes the collaboration and communication of both software developers and other IT professionals while automating the process of software delivery and infrastructure changes”. In many ways much of what is described there is not new. We’ve been looking for ways to control and automate these processes for many years. In fact, one of the first projects I worked on when I joined the industry over a quarter of a century ago was a suite of software to control and automate the building and deployment of mainframe software components (software which in large part is still being used today).
Where I believe that DevOps seeks to do something new is in building bridges between the different disciplines that are involved in software delivery, deployment and operation. I guess that this is indicated in the name, since DevOps is a bringing together of “Development” and “Operations”.
When we started looking at automating the processes involved in the delivery of database changes, it became evident that there was major work to be done in bridging these two worlds. Developers were working in the graphical IDE world of Data Studio. DBAs and implementers would be found in one of two spaces: either a command line and script world (DB2 for LUW) or a JCL and SPUFI world (DB2 for z/OS). Most of what will be discussed in the rest of this article was built to address the DB2 for LUW situation, but recently we have been applying exactly the same principles to a DB2 for z/OS environment (and indeed to a hybrid LUW / z/OS development).
The Importance of Source Control
At the centre of any DevOps process is a need to keep track of the status of each component in the environment. For software changes, including database changes, that leads to the need for source control. One would have thought this was obvious, but I’ve found over the years that many places do not have source control of their DDL. Many times I’ve heard the phrase “the catalog is king” quoted to me, with the suggestion that there is no need to manage database build DDL because they can always generate DDL from the database catalog. This assumes that DDL generated from the catalog by whichever tool they use is actually going to be correct. Unfortunately, in recent years as the pace of change in DB2 has increased even the best of tooling has not been able to keep up. If an environment only uses simple structures then no problems are likely to be experienced. But the more that new features are exploited, the more likely that this strategy is going to fall down. Ironically, DB2 for LUW used to be more problematic in this area than DB2 for z/OS, but in recent years db2look has improved dramatically, probably because other IBM products rely on it.
So, it is strongly advised that a strategy for maintaining database build DDL in source control is established. It is less critical which particular source control tooling is used; it is much more important that it is being used. Often the choice of source control tool is determined by corporate strategy. For the database tooling we are going to consider in this article, primarily IBM Data Studio and the operating system command line environments, support is available for most source control platforms.
Source Control Strategies
There are two broad strategies for managing DDL for database changes in source control: keeping full database build snapshots or keeping incremental version scripts. The fact that any change to an existing database must take steps to preserve the data already in it and the integrity of related objects makes database change management much more complicated than standard application development, where often all that is required is a new executable to be produced and used to replace the existing one.
The first strategy maintains in source control the full DDL to create all the database objects required to support an application. It often also includes DML to populate static (lookup) tables and sometimes also DCL to grant access to the created objects. As changes come in, the master DDL scripts are upgraded to reflect the current state of the database. The transformation scripts for converting from one version of the database are not generally stored. This approach is most favoured where tooling to generate the change scripts has been found to be adequate, and is most often seen in DB2 for z/OS environments.
The second approach starts by storing a full baseline DDL build, and then a set of operations to transform the database to subsequent versions. Knowing the current version that a database is at, it is then possible to go to any subsequent version by running each version transformation operation set in order. Often a set of reversal scripts are also stored, so that the database can often be moved back a version as well (more often used during development than against production databases thankfully). This approach is the one baked into popular web development frameworks such as Ruby on Rails, and is also the process supported by open source tools such as Flyway and Liquibase.
In practice, a hybrid approach often works best. During development there may be many small incremental versions maintained. However, change windows and other operational considerations normally require that a release package consolidates versions operating on the same object set together. And once a package of changes has been deployed into production, the baseline scripts are updated to reflect the current status so that building new development environments is as smooth as possible. How often incremental changes are consolidated, and how often and at what stage in the development lifecycle the base build is updated, will depend on the type of development methodology being followed. It takes rigour to ensure that updates take place, particularly to the base builds, but it will repay itself in easier maintenance of development environments and less operational problems.
Looking Inside a Data Studio Project
Having decided that we are going to manage our DDL in source control, we now turn our attention to database development with Data Studio. By “database development” we mean –
- Traditional database objects such as tables and indexes
- “Application objects” such as stored procedures and UDFs (user defined functions)
- Maintenance of static data
- Database configuration parameters
- XSDs for storage in the XML Schema Repository (XSR)
For many the tool of choice for managing all these during development is IBM Data Studio. Data Studio projects are maintained in a source control tool. But when it comes to deploying these objects into production, it is normally expected that an automated process should be able to take what Data Studio has produced, and use these without having to manually intervene in any way.
To understand how this can be done, we need to examine the files produced by a Data Studio project, in particular a Data Development Project. The items which can be created are split into a number of categories on the user interface and it will be found that each of these types has different file ending and characteristics. Here we will only consider a subset of the items which we are particularly interested in.
- SQL Scripts: files with a file extension .sql are found here. Each file can contain many SQL statements, with each statement separated by a terminator. The terminator is a semicolon by default but can be different if needed.
- Stored Procedures: files are <spname>.spsql for the first stored procedure of a particular name, with subsequent stored procedures with the same name (different signatures) being called <spname>_<n>.spsql (<n> is a number starting with 1). The code for a single stored procedure is held in each .spsql file, and there is no end terminator
- UDFs: files are <udfname>.udfsql for the first UDF of a particular name, and follow the pattern for stored procedures for both subsequent UDFs and for the contents.
- PL/SQL Packages: files are <pkgname>.pkgsql and a single file contains all the code for a package (the package definition itself, the package body and all the contained procedures and functions). We won’t deal with these any more in this article.
- XML files: the suffixes for storing XML files are as expected. Regular XML documents are suffixed .xml, XSDs are .xsd, XSLTs are .xslt and WSDLs have .wsdl.
The SQL scripts present no challenge for execution. The can be run from a variety of tools, including the long-established DB2 Command Line Processor (CLP).
The stored procedure and UDF files are more of a challenge, since DB2 CLP needs a terminator to operate successfully and Data Studio does not support one. We do not want to write SPs and UDFs in standard SQL files, since Data Studio provides extra functionality such as debugging from the appropriate specialized menu option. Our aim is to be able to use the files produced by Data Studio unchanged in automated deployments.
CLPPlus to the Rescue
However we discovered that the newer DB2 tool CLPPlus is able to accept all the scripts produced by Data Studio unchanged. Over time we have discovered and been able to exploit other features of CLPPlus to enhance the automation, but it was this feature that first brought us to the tool. We have also found some limitations in the tool that we will mention later.
A typical interactive execution of CLPPlus would use the command –
clpplus -nw db2inst1@localhost:50000/SAMPLE
The “-nw” (non-windowed) option prevents a separate window being opened up by CLPPlus, and is essential for running over SSH on servers (particularly those with no graphical interface available).
It is possible to execute a script (say wrapper.clp) directly from the command line or in a shell script by including the password and appending the script name to the end of this command –
clpplus -nw db2inst1/password@localhost:50000/SAMPLE @wrapper.clp
However we found that to exercise more control over the connection, including using SSL/TLS to secure the connection, it was essential to use a db2dsdriver.cfg file.
A typical file would be –
<dsn alias="SAMPLE" name="SAMPLE" host="localhost" port="50443">
<database name="SAMPLE" host="localhost" port="50443">
<parameter name="SecurityTransportMode" value="SSL"/>
<parameter name="SSLServerCertificate" value="db2server.arm"/>
<parameter name="UserID" value="db2inst1"/>
<parameter name="Password" value="mypassword"/>
It is necessary to set an environment variable pointing to the directory containing db2dsdriver.cfg –
While it is possible to build up a db2dsdriver.cfg file containing definitions for many connections, we found it better to set up a template for one database and substitute the various database specific values at execution time using a search and replace tool such as sed. This also meant that we did not need to leave a file with connection credentials lying around, as these would be fed into the deployment script as parameters at run time.
When using the db2dsdriver.cfg file, the command for execution of CLPPlus is simplified –
clpplus -nw db2inst1@SAMPLE @wrapper.clp
Obtaining the various variables for the execution, setting up the CLPPlus execution environment, configuring the db2dsdriver.cfg, executing CLPPlus and cleaning up after are all controlled by a shell script, which can remain virtually unchanged for all uses. The deployment specific details are all to be found in the file executed by CLPPlus (wrapper.clp). It was decided that to enable as wide portability as possible we would use a shell that is required by DB2 : the Korn shell (ksh). We discovered that on AIX a very old version of the Korn shell is available as simply ksh. To get cross platform compatibility we had to use the ksh93 alias, which works on all AIX and Linux variants we have tried.
Files Not Visible In Data Studio
Having set up the CLPPlus automated execution by checking the project out of source control onto a UNIX server, we found that even after checking the various files back into source control that some of the files were not visible in Data Studio. This is because a Data Studio Data Development Project only recognizes some file extensions; the other files are actually there but not visible. We discovered it was not possible to extend the range of files visible in the project. We also have requested “Other Files” tab to be added, but this hasn’t appeared as yet. However, we found that all the files are visible using the Data Studio Navigator view: Window / Show View / Other… / General / Navigator.
Other Useful CLPPlus Features
While working on automation we were able to exploit a number of other useful CLPPlus features. We’ve included some basic information about some of these here, but there are a lot more.
- Execution logging: to enable a log of a CLPPlus script’s execution using “SPOOL <logfle>”. To enable detailed logging, with full statement details rather than just results, use “SET ECHO ON”. To close the logfile at the end use “SPOOL OFF”. You can use variables (see below) in constructing the logfile name, to enable you to construct a dfferent log file for each run (perhaps using a timestamp).
- Internal variable creation and substitution: variables can be defined using “DEFINE <varname>=<value>” command. It is also possible to define a variable tied to a column alias using “COLUMN <column> NEW_VALUE <varname>”, which then allows you to populate a variable from a singleton SELECT statement (the COLUMN command also allows, in conjunction with the BREAK command, display formatting of results from a multi-row select but that is beyond the scope of this article). When the option “SET DEFINE ON” is in effect the value of the variable can be used by coding “&<varname>”.
- Environment variable passing: an environment variable defined in the shell environment using “EXPORT <envvar>=<value>” can be accessed in the CLPPlus script by enabling variable substitution support using “SET DEFINE ON” (as above) and then referring to the variable with a “$<envvar>” (note using a dollar sign rather than the ampersand used to refer to locally defined variables).
- Error handling behaviour: error handling is fairly basic, but does allow execution to be halted on a SQL exception using “WHENEVER SQLERROR EXIT” and on an error in an operating system command using “WHENEVER OSERROR EXIT”. For statements which may legitimately return an error (for example attempting to delete a table which might not exist) then error handling can be disabled using “WHENEVER SQLERROR CONTINUE NONE” (similar for OSERROR).
Some CLPPlus Limitations
There are some things that CLPPlus cannot do which the older CLP can that limit what we were able to achieve with this technique. These limitations were particularly restrictive when defining automation for creating and configuring a new database for use, and were worked around by putting significant portions of this function as old-style CLP commands executed directly from the shell script.
- Limited CREATE DATABASE support : while the CREATE DATABASE command is supported from CLPPlus, it only supports a small subset of parameters (CODESET, TERRITORY and PAGESIZE) and so really isn’t useful.
- Database commands: apart from the limited CREATE DATABASE command, no other DB2 commands are supported directly from CLPPlus. Thankfully, many of the more common commands are supported via the ADMIN_CMD stored procedure.
- XSD Control : the commands REGISTER XMLSCHEMA, ADD XMLSCHEMA DOCUMENT and COMPLETE XMLSCHEMA are used to manage the XML Schema Repository (XSR). These are DB2 commands, not SQL statements, but are not supported by ADMIN_CMD. There are system stored procedures which perform the same task (XSR_REGISTER, XSR_ADDSCHEMADOC and XSR_COMPLETE) but we struggled to get these to work due to the need for the XSD documents to be presented to them as BLOBs.
- Only a small subset of the many db2dsdriver.cfg parameters are supported by CLPPlus. Any that are not supported are simply ignored. The supported parameters cover authentication and encryption. None of the parameters that allow configuration of the connection (e.g. CurrentSQLID) are supported. We found it convenient to set up any of these values we needed as environment variables in the shell script, which we then made use of using the environment variable passing functionality in the CLPPlus wrapper script.
Coping With Platform Differences
We had a requirement to cater for different builds of certain databases on multiple platforms, both different versions of DB2 for LUW and also spanning different DB2 platforms (LUW and z/OS).
We investigated the use of anonymous blocks within SQL scripts, but this feature is not supported on all versions of DB2 for LUW, and not at all on DB2 for z/OS.
Therefore, we looked around for a convenient command line tool that could perform conditional processing of source code, and found some success using the C/C++ pre-processor (cpp). CPP is available as a system package on most UNIX systems (often installed by default). On Windows we found the easiest way to get CPP was to install Strawberry Perl, which includes a full C/C++ build stack.
Here is a small sample of a typical .cpp file, in this case used to define tablespace creation statements which may be executed on either DB2 for z/OS or DB2 for LUW –
#define BUILDLUW 0
#define BUILDZOS 1
CREATE TABLESPACE SDBA001
#if BUILDENV == BUILDLUW
Apart from the #if #else #end construct, we also use #include to pull in a standard template files for common portions of the definition, in this case site defaults for LUW (TSLUW8K.cpp) and z/OS (TSZOS4K.cpp) tablespaces, helping to ensure consistency across all definitions. We would then run the pre-processor to create either a z/OS or LUW specific SQL script for execution –
cpp -D BUILDENV=<env> -P -w -I . <input.cpp> <output.sql>
where <env> is set to 0 (LUW) or 1 (z/OS) using a variable in our shell script.
In some places we use placeholders for values which are deployment specific, and use sed to substitute the real values. For example for the tablespace definition portion of a table we have –
#if BUILDENV == BUILDLUW
We then run –
sed -I 's/$DBNAME/<dbname>/g' <file.sql>
to substitute in the real database name (supplied as a variable in the shell script). Care has to be taken that if you use $<value> to identify a substitution placeholder that you actually do substitute them all before trying to execute the SQL with CLPPlus, since any unresolved values will be treated as imported environment variables (as described above) and if not found will be replaced with an empty string.
Hopefully some of these ideas will help others make a start on their own automation efforts.
Enabling Multi-Team Development (LUW Specific)
One challenge we faced as we moved into this “brave new world” was around enabling team working in shared development databases. The main issues were around the privileges required to create application objects (stored procedures and UDFs) and the actions required to change an application object created by another developer.
When creating application objects, access to the objects that are referenced must be granted to the individual user rather than a group. This problem was overcome with the introduction of database roles; a privilege granted to a role was acceptable for creation of application objects. However, the use of roles did not solve the second problem, which was that the “CREATE OR REPLACE” syntax only works for the original object creator. The older style of changing application objects (DROP and then CREATE) was problematic, since when the object was dropped existing execution permissions were lost and any “parent” objects (e.g. a SP which called the changing SP) are invalidated.
As the use of stored procedures, essentially as Enterprise Data Services, increased the number of developers working on them also increased and this became a major administrative issue.
We found a solution using session authorization. In DB2 for LUW, the statement “SET SESSION AUTHORIZATION = ‘<auth-name>’ ALLOW ADMINISTRATION” allows a developer to assume a different authorization ID (it sets the value of the SESSION_USER special register). This is similar, but not identical, to the use of “SET CURRENT SQLID” on DB2 for z/OS; in particular setting the session authorization still retains the initial login SYSTEM_USER special register, which is used as the default schema / qualifier for unqualified SQL statements. However the SESSION_USER is used on object creation as the “OWNER” (different from “SCHEMA”), which is how it helps with our “CREATE OR REPLACE” issue.
There are some limitations as to when “SET SESSION AUTHORIZATION” can be performed. We found it best to produce a CONNECT_PROC (activated using the DB CFG parameter of that name, and requiring a database connection to set) which attempts to issue the “SET SESSION AUTHORIZATION” but allow the connection to continue if not authorized with the initial authorization ID. We can then add developers who are allowed to use session authorization (i.e. who are allowed to define objects) to a group with the SETSESSIONUSER privilege.
We were able to restructure our database standards to move all actions requiring SYSADM privilege into a base database build. We used “CREATE SCHEMA <schema> AUTHORIZATION <schema>” to create a schema for all objects, where the schema is owned by an authorization ID of the same name, which will also be the one we use for session authorization. We create a set of standard bufferpools and tablespaces, and give use of the tablespaces to that ID as well. Then all developers with access to the authorization ID via session authorization can create (and “CREATE OR REPLACE”) objects easily.
This has additional benefits for production deployments. In production if we create a deployment ID, protected by appropriate access controls, as the only ID in the group able with SETSESSIONUSER privilege it means that after initial database creation virtually all deployments can be done with reduced privileges (we have eliminated the need for SYSADM for the majority of deployments).
There are some drawbacks to this approach which must be considered however. Since we are giving developers access to an authorization ID which is effective over a whole schema, they can create regular database objects such as tables as well as application objects. In some organisations the need to separate these two sets of privileges might be required. This should be possible with a dual session authorization scenario, as long as the two privilege sets were discrete. The other drawback is that the owner column in the catalog is always set to the session authorization ID rather than the system (connection) user, so it is not possible to see which developer actually created the object. This may be unacceptable; although the information about who defined the changes should be available in the source control system, information about who actually deployed the change will not be.
If you can live with the drawbacks, then this approach can offer considerable benefits in terms of ease of development, particularly with large teams, and reduction in privilege levels for most deployments. Particularly when trying to do automated deployments or continuous integration these benefits can be significant.
In this article we have considered some techniques for integrating the worlds of application development and deployment using commonly available tools. We’ve stressed the importance of source control and have shown how efficiency improvements in the development arena can often also improve the deployment automation process. Perhaps the biggest lesson we have learned is that it is possible to enable DBAs and developers to work together in harmony. Each will have to learn some new techniques and processes, but if there is a willingness to do this then the result will be smoother and higher quality software for their consumers.