Practical Enterprise Agile Data Techniques
With the current interest in DevOps, a renewed focus on streamlining database change and integrating it with associated application change has been evident in recent times. The topic of “agile database development” is not a new one. The classic book by Scott W Ambler on the subject dates from 2003. A lot of information on the whole subject of Agile Data can be found at Scott’s website http://agiledata.org. In this paper, we will look at one part of the overall topic of agile database development – the concept of database refactoring. We will look at some of the challenges of applying database refactoring in an enterprise environment. We will also see how the techniques learnt while applying database refactoring in an enterprise environment can also help in making wider database changes in an agile way.
What Is Database Refactoring?
The standard definition of database refactoring, taken from Ambler’s book, is “a simple change to a database schema that improves its design while retaining both its behavioural and informational semantics”. We will try to unpack this statement below to understand some of the challenges involved.
“A Simple Change”
It is widely acknowledged that one of the main challenges of database refactoring is that virtually nothing is simple in practice, even if the concept behind it is simple. IBM has tried to make more things simple by improving the number of changes that can be made using ALTER statements and online, but there are still a lot of changes which either require an outage or are otherwise invasive. It should be expected that any good database professional will know the capabilities of the version of DB2 that they are use it and seek to exploit them in order to minimize the impact of change. A typical example of a DB2 feature that makes a big difference is the “CREATE OR REPLACE” option (for stored procedures, UDFs and views) in recent versions of DB2 for LUW, but still sadly lacking in DB2 for z/OS, which means that changes these objects does not most of the time result in loss of privileges or invalidation of related objects.
“A Database Schema”
Probably the first thing we have to address here is what is meant by a “database schema” in this scenario. To many database professionals, this has a very narrow meaning consisting of only traditional database objects such as tables, indexes and views. But one of the first things you must grasp when working in an Agile Data world is that the database schema goes beyond these to the application code which accesses the data, whether that application code is contained within the database itself (e.g. in stored procedures) or even going beyond this to traditional applications. To work effectively according to this new definition requires the barriers which often exist between database professionals and application developers to come down, and for a new era of cooperation to begin.
The reason why this cooperation is so necessary is, in the words of John Donne, “No man is an island” (from “Devotions upon Emergent Occasions”, published in 1624). If we make a change to a database without taking into account where that database is used, then we will, at least, cause inconvenience and more likely will cause damage or loss. Tracking down all of the applications and people who use a database resource is never easy, particularly in these days of dynamic SQL and ad-hoc queries.
One way to try to tackle this problem is to limit the ways in which a database can be accessed. Most commonly this is done by only providing access through a stored procedure interface. This approach works well for operational databases, and indeed provides additional benefits such as increased security.
However, in analytical databases it really isn’t a viable option since much of the benefit of these environments is that the power users have “freedom of expression”. In these environments, it is worthwhile to consider having a widely accessible data dictionary which can be used to publicize proposed changes to schemas in advance, giving users time to adjust their queries accordingly.
“Improves its Design”
The whole point of database refactoring is to make something better. When we say “better” we could mean a number of different things. The most common is probably to improve performance, using a variety of techniques from simple indexing to a wholesale restructuring of the database. it is also common to improve a database by redesigning it to improve its logical design, for example by separating a composite column into its atomic parts or by applying normalization to a table. And since we have defined our database schema as containing not just structural elements, such as tables and indexes, but also functional elements such as stored procedures and triggers, it is commonplace to refactor these as you would any other application program to make them easier to understand or to improve the methodology or algorithm used.
Particularly in the case when functional elements are being refactored it is important that good tests are available to ensure that the results before and after the changes are consistent. Support for automated testing of database change is in its infancy, but it is likely will become more commonplace as more automation occurs across the whole DevOps spectrum. For functional elements such as stored procedures, Data Studio has had a rudimentary testing harness for some time.
There are also open source efforts, such as DB2Unit ( https://angoca.github.io/db2unit/ ), which are trying to plug some of the gaps.
“Retaining both its Behavioural and Informational Semantics”
It is this statement that brings out one of the key challenges of database refactoring. Code refactoring really only has to concern itself with behavioural semantics; that is, that a piece of code being changed gives the same result before and after. It is the need to retain informational semantics that adds the complexity. This means that when you change a database object,the data in that object remains unscathed. As a simple example, if you have to drop and recreate a table to make a change to it then you must ensure that the data in the table is preserved intact throughout the change.
Challenges of Enterprise Database Refactoring
As we mentioned earlier, the recent re-emergence of interest in database change automation has come about as a result of the rise in DevOps. In many ways, a new wave of database developers are revisiting the challenges faced by those of us who were there in the early days of DB2. We soon discovered that creating new tables was easy, but changing them once they had been populated with a significant amount of data wasn’t easy.
For that reason, some of the earliest tools to emerge from a variety of third party vendors were to assist with database change automation. These tools tended to be used in isolation by DBAs, with the only interaction with the application development space being the identification and resolution of package issues related to structural change.
A more recent development is the appearance of application frameworks with integrated database change functionality. One of the earliest of these, and arguably the one which has become the most sophisticated, is the Ruby on Rails framework with its ActiveRecord Migrations (http://guides.rubyonrails.org/migrations.html ). This has been mimicked for virtually all common application programming languages and frameworks. Probably the most significant tools to emerge in this space have been the open source tools Liquibase ( http://liquibase.org ) and FlyWay ( http://flywaydb.org ), which are both Java-based and can be used either standalone or integrated with a variety of frameworks and deployment tools.
A common feature of most of these database change tools is the concept of a “migration”, which is a small step which can be applied or rolled back as a unit. Most of the tools require the definition of a forward (apply) and reverse (back-out) migration, and as such multiple migration steps can be applied or backed out by apply these in sequence. Some of the tools provide modules for generating the DBMS-specific DDL for a migration from inputs specified in a generic Domain Specific Language (DSL). Alternatively, they allow user supplied DDL to be used directly for the migration.
And it is here that one of the biggest challenges faced in enterprise environments becomes apparent. The tools all require the input to be DDL, or at least some form of SQL that can be executed against DB2 via one of the provided SQL processing APIs (JDBC or the variety of drivers supported by CLI).
The need to preserve informational semantics presents an issue whenever the change requires an “extended alter” (i.e. it cannot be achieved with a simple ALTER statement) and when the data volume is significant. The most common method employed by these tools to apply such changes involves creating a new table with the required new structure, moving the data into it using an INSERT from SELECT statement, and then realigning existing structures using either renames or drops and recreates. The easiest way to describe this methodology in an enterprise system is “death by logging”.
Even when a change can be achieved using an ALTER great care has to be taken. The AREO* (Advisory Reorg Pending) status from DB2 for z/OS is an obvious example of the side effects that such changes cause. Another example is the effect of changes being propagated to history tables in system temporal structures (we have found that this works slightly differently in DB2 for LUW and DB2 for z/OS, particular with respect to default values).
So perhaps the biggest challenge with applying database refactoring in an enterprise environment is doing it in a way which is appropriate for the volumes of data involved. Additionally, in many environments, the change must be carried out either in a way which allows application workload to continue to function or must be completed within a limited dedicated change window.
We have found that tools which integrate well with application change automation are often not using database change techniques which are enterprise suitable. On the other side, enterprise database change tools are normally standalone in nature, and do not integrate well with application change automation. It is expected that both sides will improve over the medium term. In the meantime, there is scope for database professionals and application automation specialists to work together to come up with bespoke solutions, always keeping in mind that as better tool support becomes available it should be able to replace your bespoke efforts with more standard tooling.
Tips for Easier Enterprise Database Refactoring
Here we present some suggestions for making enterprise database refactoring easier in your organisation. These have all came about as a result of direct experience within a number of organisations.
Collaboration Is Key
The time for maintaining fences between database and application specialists has long since gone. It is time to leave our “ivory towers” and get actively involved in working together to achieve the goal of delivering the best software solutions for our organisations and clients.
As database professionals, we need to widen our horizons, and get to know more about the current state of application development best practices, and to actively participate in initiatives such as DevOps. We have a lot to bring to the party, but if we come along with the viewpoint that we will be able to continue to work as we have always done, then we are not going to help.
Don’t expect that you will be the teacher and the developers will be your pupils. In some areas, this will be the case, but you often will have as much to learn from them as they do from you. Our experience is that in the last few decades, application development methods have changed significantly while DBA techniques have hardly altered, so we have quite a bit of catching up to do. Of course, we still have to ensure the same key outcomes, particularly with respect to data integrity and security, but the ways we work to achieve these goals should probably change.
Get to Know Your Database Technology
Since one of the key principles of agile methodologies is that change occurs often. If this is going to be able to happen in an enterprise environment, then you, as a database professional, must exploit all the functionality at your disposal.
You need to be well aware of every feature of the DBMS you are using which will allow you to make a change as effortlessly as possible.
If you have third-party tools at your disposal, then you also need to know their capabilities and how they can be used to achieve this goal.
You should understand how tocan integrate your existing tools with the application development tools in as seamless a way as possible.
Here are some thoughts on areas where some quick wins could be made –
- Know exactly what can be achieved without having to rely on the multi-step processes often called “extended alters”. Whenever you upgrade to a new version of DB2, ensure you re-evaluate the possibilities and rework your processes to take advantage of any new features which are available. Ensure that you thoroughly test the use of each new feature in a sandbox environment to know exactly how it works, and document all the steps required when using this feature (e.g. the need to reorg after many ALTER statements).
- Know the capabilities of your utilities, whether from IBM or third-party vendors. In particular, understand how to use online utilities (ones which will allow full or partial access to the data by applications while making changes) and seek to use these wherever possible. Make sure to use options of utilities which allow you to consolidate multiple utilities into one, such as taking a backup and gathering statistics at the same time as you are loading data.
- Get to know the DB2 command line tools, even if you are a DB2 for z/OS DBA. It is often easier to integrate database changes with application changes if you can execute the required changes from a DB2 client rather than from a more traditional DBA environment. The newer CLPPLUS command processor is strongly recommended, both for its enhanced capabilities and its good support for functional components, such as stored procedures, produced using Data Studio.
- Become familiar with alternative methods of executing utilities. This is really an adjunct to the previous suggestion. Using utilities as part of an enterprise database change is virtually a necessity, either because the change requires a utility (e.g. a reorg to clear an AREO* flag in DB2 for z/OS) or because an extended alter could not be achieved with enterprise data volumes without using an UNLOAD (z/OS) / EXPORT (LUW) followed by a LOAD. We have found that a knowledge of the ADMIN_CMD (LUW) and DSNUTILU (z/OS) stored procedures has been an excellent way of bridging the gap between traditional methods and integrated database / application deployments.
- Work with your database tool vendors on enabling integration with DevOps / automated deployment tools. If this support is not currently available and you need to integrate database changes into automated deployments, then consider using your existing tooling as an advisor for what needs to be done, with what it produces forming the basis of the automated deployment script. On the other side of the equation, work with your deployment tool vendors to ensure they provide support for common database tools, or that they provide the capability to execute jobs from environment specific execution tools (JCL on z/OS, shell scripts on UNIX, etc.)
- Particularly when you are exploiting recent DB2 functionality, never blindly assume that any tool, no matter how sophisticated, will always get a change right. Always manually check what it has produced and then validate in a safe environment first.
Don’t Forget Recovery
Always ensure that when you produce a database refactoring that you consider how you can recover from this in the event of a problem. There are two aspects to this –
Failure during execution: if one part of a refactoring fails, what is the process you need to follow to either back out to the start of the change or restart the process. This is where many of the traditional database tools shine, with their automated checkpoint generation and restart capabilities. If you are not using such tools, then you need to understand how you would achieve the equivalent result yourself.
Problems following execution: if the database change succeeds but for some reason needs to be undone (often because of a problem with a related application change) then you need to have the ability to get back to where you started. It is our experience that in many cases the back-out process is much more involved than the original change. For example, we’ve often found that a forward change that can be made using an ALTER statement needs a sophisticated “extended alter” to revert.
In both of these scenarios, preparation is key. You should test both the deployment and back-out well ahead of time. Again, understanding all the options available via the version of DB2 and the tooling you have available is important. For example, particularly with DB2 for LUW, the easiest way to back-out (if your operational environment will stand it) is to restore a backup taken immediately before the change was applied.
Another option which some sites consider is actually leaving a database change in place even if an application change is backed out. Normally this does involve some degree of compromise (e.g. you may need to make some columns nullable that you would prefer not to be), but it may be the only way to achieve your change within the time available. In some cases a compromise might only need to be temporary. For example, you may consider adding a column as nullable initially and then once the application change has been proven you can make a subsequent change to change it to be not nullable (we will cover this more generally in more detail next).
Consider a Transition Period If Required
It is often the case that many attempts at database refactoring fail because they have too great an impact to achieve at one time. Most typically the item being refactored is widely used, and to change all the application code related to it in one time is not practical. One way round this is to make the change in two or more steps, with a transitional period where both the old and the new states are supported. For example, if you wanted to change a multi-purpose column into multiple single purpose columns you could retain both for a period with the two column sets being kept in sync using triggers.
Obviously, maintaining two states in such a manner is an overhead that you would not want to support indefinitely. So, the success of such a method involves clear communication of the timescales involved, and if possible automated reporting of applications still using the old structures (hint: autonomous transaction support is very useful for adding this type of logging).
If it is possible to perform a refactoring in one step, then so much the better. But it is better to make an improvement in two steps than not to make it at all.
Hopefully this article will have given you something to think about in terms of getting involved in the new era of collaborative development. While this article focused on database refactoring, many of the topics covered here apply equally well to agile schema evolution (making changes which alter the behavioural or informational semantics of your database). There is also so much more to learn about agile database development, so I’d once again encourage you to check out http://agiledata.org for more information.
If there is one key message that I’d like to leave with all database professionals it is that there are great benefits, and a lot of professional satisfaction, to be gained from working more closely with application developers. Go out there, humbly and helpfully, and work together to make things better for your organization.