Sometimes, a database must be moved to a new physical home. This can mean new processors, new disk subsystems or sometimes just a new design and architecture on the same hardware or same type of hardware. This article discusses these migrations and some techniques and tools available to migrate databases, and in the process, change any number of aspects about it. It then describes a complex database migration and the implementation of that migration.
Reasons for Database Migrations
There are many reasons a database might require migration. Understanding the reason you need to migrate is the first step in planning a successful migration. Some typical reasons include:
- Database has outgrown the current hardware
- A New Architecture provides needed functionality
- Data Center Migrations
- Moves into the Cloud
- Need better performance
- Costs – SW/HW/Support
- New Applications and Merged DB designs
Types of Database Migrations
Understanding the type of database migration you are undertaking will also provide insight into the alternative techniques available or techniques which will not work. I classify all migrations into one of these types:
- Platform change
- DB2 Architecture change
- Endianness Change
Platform Change migrations are changes that involve moving to new hardware, such as new processors or new disk. Platform changes can often be very simple migrations if there are no underlying architecture or configuration changes also involved. The typical platform change is just an upgrade to faster processors or faster disk. If the change is limited to processors, the data in the database does not have to move and the disks on which the database resides can be swung over to the new processors. This will require a short outage in most cases. If the change involves moving to new disk, the data can usually be moved with disk copy technologies or backup/restore processes.
Architecture Changes include changes in physical characteristics such as number of database partitions, moving to/from pureScale or changes in RDBMS software, such as a change from Oracle to DB2 or from Mainframe DB2 to DB2 LUW. These types of changes require unloading the data and reloading it, although in some cases heterogeneous replication can be used to copy the data.
If the destination hardware is sufficiently different, it may also qualify as Endianness Change, such as moving between Intel (Little Endian) and Power (Big Endian). Note that IBM’s Power8 family recently added a Little Endian option for some Linux implementations. Endianness changes require a complete unload and reload of the data. Replication-based technologies that transform log records into new SQL statements to be applied to the target database can also be used, and include Q-Rep, SQL-Rep and Change Data Capture. However, DB2’s HADR would not be included in this list because the transaction replication HADR uses transmits the internal format of the data thereby maintaining the underlying data format.
Planning the Migration
Thorough planning for a migration is the key to minimizing problems during a migration. The plan should start with the defining the business objectives and identifying constraints such as downtime windows, costs, tools and skill sets available. The following steps should be considered when planning the migration:
- Business Objective & Technical Objectives
- Technical Constraints
- Identify Potential Techniques
- Obtain Appropriate Assistance and Advice
- Analyze Each Potential Technique
- Plan for the Network & Client SW changes required
- Develop Cost Budget & Planning estimates
- Evaluate Alternatives
- Test Solutions (if needed)
- Select Solution
- Implement – Strong Project Management required
- Evaluate Implementation
When moving a database, minimizing downtime is often critical. There are several techniques that can be used to minimize downtime, but the type of migration will affect which of these techniques is viable. If the data must be completely unloaded and reloaded, the migration will be much more time consuming and complicated than if the database migration is not changing the underlying data structures.
Options for minimizing downtime include:
- Pre-migrate as much data as possible prior to the cutover weekend. I call this a “Texas Two-step” migration. Select some criteria for identifying static data and move as much of that data as possible prior to the “go live” cutover.
- Replication products, such as Q-Rep, SQL-Rep or Change Data Capture, can be used in together with data copies or database backup/restore to replicate changes after a point in time. Replication can virtually eliminate all down time. Take an online backup and immediately begin replication, capturing changes. Restore the backup, and begin applying the replicated transactions, bringing the databases in synch with each other.
- High Performance Unload is a very powerful tool that can unload and load data into the target machine. It is much faster than export or a cursor based select. It also has the capability to unload from backup images.
There are many tools available that can be used in migrations under different circumstances. The chart below lists many of the tools and techniques and describes how each affects a migration and when it can be used.
Keeping the migration plan as simple as possible will go a long way to the success of a migration. For example, if a backup and restore will work, that is much simpler and less prone to problems than a table by table unload/reload, even when tools such as IBM’s High Performance Unload are used to automate the unload and reload. However, there are times when table by table copies are required, because a table by table copy provides a lot of flexibility.
Keep it Simple or Exploit Flexibility?
“Keeping it Simple” is a good approach whenever moving an important database. If you are changing architectures or platforms and have to use a table by table copy, one should consider the trade-off of making no optional changes at the same time versus implementing additional features or function while migrating. Many installations have very few chances to unload and reload their data so there is much desire to make desired changes or exploiting new features during a migration. The risks of impacting performance or missing migration timelines due to the decision to implement additional optional but valuable changes should be considered very carefully.
Some of the optional changes that are often considered include
- DBMS version upgrades. New versions often have new valuable features. But, if implementing those features will drive effort, complication and costs up, then during the migration may not be the time to upgrade.
- Table/Data reordering. Sorting your data during unload can help share the CPU time across the source machine and target machine (which is often swamped writing new data pages out to disk).
- Different features. Reloading your tables provide the opportunity to change page sizes, tablespace definitions, range partitioning or MDC characteristics.
- Enabling compression or changing compression techniques.
- Converting tables to COLUMN organization (in DB2 10.5 BLU)
- Defining different indexes. Indexes can be recreated so a migration may afford you the chance to modify your indexing. For example, convert indexes from not-partitioned to partitioned or adding include columns
Many times the amount of RAM on the machine or disk performance characteristics will also change during the migration. It is important to carefully plan the impact that these changes will have. The best way to manage these changes is to have a Baseline Performance Test. This is a set of performance tests that you can run on your current hardware prior to the migration and repeat the performance tests on the new hardware prior to the migration.
Table by Table migrations provide the most flexibility, but they are also the most complex. If migrating table by table, a good plan is to use a process that automates recording the progress and status of each table migration. I recommend scripting which use DB2 tables to record the progress of each step of the migration:
- Create migration status tables in DB2 to track the status
- Pre-seed the tables with the names of the tables or object to migrate
- Have the scripts which actually move the data update the DB2 tracking table
- At begin of table migration
- End of migration
- Validation of data (compare row counts in source and target)
- On any error during the table move
- Keep the start/stop times
Use DB2 tables to track the progress makes it easy to report on how many tables have moved, the speed of the migrations, failures, etc., all using SQL!
Managing the migration process is the most important aspect of any migration. One needs both strong personnel management to ensure proper application/user involvement in the testing as well as strong management of the tracking of all DDL and data changes. With thousands of tables to migrate, automation is mandatory.
Case Study – DB2 DPF Migration from Linux Intel to AIX Power
In this example migration, the technology on the source system was old and outdated. The database was a DB2 9.7 database with DPF running on Linux on Intel. IBM’s Power8 technology was selected as the new platform, running AIX. The change from Intel to Power required a complete unload-reload strategy due to the Endianness change.
The migration team decided to keep the migration as simple as possible. It was felt that moving to AIX on Power with a new disk subsystem was so much change that eliminating as many variables as possible would facilitate a less risky migration. It would be easier to manage performance characteristics if few other changes were implemented during the migration and post migration performance management. So, no upgrade to DB2 10.5 or change in number of database partitions would be implemented during the migration. Some changes made sense – reordering the data during the migration and bufferpools and database configuration parameters were adjusted to make use of additional memory.
Business and technical challenges included:
- Minimizing downtime – a 60 hour window was the largest available
- No ability to pre-migrate any significant portion of the database (no static data)
- Thousands of tables
- Over 110TB of raw data
- Only 1Gb network connections into source
A backup-restore-CDC replication solution was considered but rejected as too slow: there was concern about performance impact on the source systems and it may have been prone to problems in the DPF environment. High Performance Unload was also considered, but rejected because of the licensing cost involved.
A solution was developed that would keep the source system up and available in Read-Only mode during the migration cutover. The process was to use a temporary server as a staging machine with faster network connections to move the data over to the new machine:
- Backup the DB
- Restore the backup to a Temp Server
- Log Ship to the Temp Server
- Rollforward the Temp Server and bring them in sync
- At begin of migration:
- Disable Update capability on Source
- Stop LogShipping and take Temp Server out of Rollforward mode
- Perform migration from Temp Server using Load from cursor
- Allow read-only queries on source system
We developed a Migration Tool which used several control tables to manage the process. The tool created DB2 tables, pre-populated with source and target database and tables. We federated the source and target database and created several scripts
- Full Table load from cursor (used for Small tables)
- Single DB Partition of table load ( for loading large tables in parallel)
- Count Source table or DB partition of a table
Each script tracks the progress of the table or table database partition it is loading by updating the DB2 tables. The scripts post process the output of the load and identify SQL errors as well as counts of rows loaded, committed and updates the tracking table.
We also created a “Stream Driver” script. The Stream Driver script selects tables to be migrated using SQL to read from the DB2 tables. Once selected, the Stream Driver script submits either a Full Table load or Single Partition load. If it submits a Single Partition load, it sends the script to the correct physical node on which the database partition resides. The stream driver script also submits a script which populates the tracking table with the count of source rows.
Concurrency was achieved by having multiple Stream Drivers running at the same time. The Stream Drivers include logic to prevent duplicate stream drivers from initiating a load against the same table or table’s database partition.
A similar process was followed to automate the creation of indexes. One complication of index creates is that you can only create one index on a table at a time. Concurrently running index-stream drivers had to ensure that they would not attempt to create indexes on the same table at the same time.
The overall process followed was:
- Use db2look to extract the DDL from the source system
- Build the target system with the DDL. This was done multiple times for testing purposes. Prior to the final run, the DDL was not completely replayed on the target system so that the Compression Dictionaries developed during test runs could be reused, speeding the data migration process and minimizing the outage window.
- Some ownership and authorizations were changed to match userid differences on the source and target (as well as to remove some users’ authorizations which shouldn’t have been there in the first place).
- At Migration Outage Window start, quiesce target database to prevent users from connecting, and bring Temp Server out of Rollforward pending status.
- Drop some objects to facilitate faster data transfers
Foreign Keys and other constraints were dropped on the target system.
Indexes (except Primary Key indexes) were dropped on the target system. Building indexes during the load command is the fastest technique, but we experienced Tempspace shortages due to running concurrent loads with all the indexes being created at the same time.
MQTs on target were dropped.
- Load tables. Statistics collection is normally a good thing to do during table loads but to speed the loads and minimize the outage window, statistics were not collected. Some tables statistics were saved from prior test runs and those statistics for the table data was still valid.
- Selectively REORG tables to improve compression
- Recreate indexes (collecting statistics on the index creations). Consideration was given to using db2look to collect mimic statistics and reuse these instead of collecting stats during index creation, but problems such as complications in data vs index values and managing the index stats updates proved to not be worth the effort.
- Recreate constraints
- Recreate foreign keys
- Recreate MQTs
- Drop Nicknames used to count the source tables
- Backup Database
The migration successfully moved 152 billion rows of data in under 22 hours. The creation of indexes, foreign keys, other constraints and MQTs took several additional hours, but the whole migration was completed in the time window allotted. Of note is that the automation in the scripts allowed us to run 225,086 independent load jobs over 22 hours and track the status of each one. Only 2 failed and the problems were resolved.
Other lessons learned include
- Check Table Compression after migration, Reorg if needed. Our threshold: 60%
- Build indexes, Primary Keys, Foreign Keys, Constraints after table loads – minimize locking on catalog when multi-DB-partitions load concurrently
- Capture VMSTAT or SAR output, monitor real time and invoke additional stream processors if CPU is available.
- Heaviest CPU consumption is on Target Machine
- With DPF, you must compare partition maps (PMAPs). If different, load source PMAP into Target before creating objects!
- Enable FCM Parallelism
- Increase SHEAPTHRES_SHR, SORTHEAP and UTIL_HEAP_SZ for loads and index creates
- Use rehearsals to identify longest running table migrations
- Use “streams” to increase Parallelism
- Avoid recreating tables prior to Cut-over. Use KEEPDICTIONARY – speeds data load
- Keep Data Statistics collected during rehearsal runs
Database Migrations can be challenging. However, if they are planned well, rehearsed and thorough testing is employed, seemingly impossible migrations can be performed without problems. Automation is extremely important when complexity is high. Keeping the migration as simple as possible and having strong management of both the migration and the testing processes will lead to a successful migration.