Integrated Transaction and Analytics Processing on IBM z Systems
STSM, z System Performance and Design
STSM, z Analytics
IBM Silicon Valley Lab
Conventional wisdom has for decades prescribed multi-layer architectures for transaction and analytics processing: core business systems run transactional applications generating massive amounts of data that get consolidated in staging areas and operational data stores, prepared for reporting and analytics in enterprise data warehouses, subsets of data further prepared for specific purposes such as data mining in datamarts. The result is an extremely complex infrastructure with costly and time-consuming ETL processes, a high degree of data duplication, challenging system management with different tools and skills needed in every layer, no consistent data governance across the enterprise, and so on. Effectively, there is no single, up-to-date source of data that provides a coherent, consistent and current view of the enterprise.
In January 2014, Gartner published a report titled “Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation” that suggests that a new architecture is needed that combines transaction and analytical processing in a single, hybrid system – and this new architecture will enable the next level of real-time analytics resulting in better business processes and simplification of the information management infrastructure:
“Hybrid transaction/analytical processing will empower application leaders to innovate via greater situation awareness and improved business agility. This will entail an upheaval in the established architectures, technologies and skills”.
“This architecture will drive the most innovation in real-time analytics over the next 10 years via greater situation awareness and improved business agility”.
“The emergence of HTAP means IT leaders must identify the value of advanced real-time analytics, and where and how these enable process innovation. By eliminating analytic latency and data synchronization issues, HTAP will enable IT leaders to simplify their information management infrastructure”.
A key imperative when creating an integrated system for transaction and analytics processing is: “Don‘t move the data!” – move analytics processing to the data instead…
So the question is: Where is the data? In many enterprises, the answer is z Systems. For example, more than 2/3 of the business transactions for US retail banks run directly on mainframes and 80% of the data accessed for analytics originates on z Systems.
For a couple of years – long before Gartner published the report quoted above – we have started to develop a vision of integrated transaction and analytical processing on z Systems. Over the years we have implemented and delivered key technology for this strategy and new, important additions will become available in 2015.
Gartner recognizes IBM as a leader in this space in a report “Magic Quadrant for Operational Database Management Systems” published in October 2014 and explicitly mentions our realization of an efficient HTAP architecture:
Performance and features: Survey participants rated IBM highly (among the top three vendors) for HA/DR and overall performance. [..].
Hardware integration: DB2 for z/OS dynamically routes analytics to the IBM DB2 Analytics Accelerator for z/OS, (Accelerator), creating an efficient HTAP architecture in a single environment and reducing mainframe MIPS to cut operating charges. [..].
In a nutshell, the z Systems strategy delivers a hybrid transactional and analytics processing environment that
- Builds on a proven platform with “transaction level” quality of service (QoS) including 99.999% availability, proven to scale up to 600 concurrent users, consistently high performance at 100% capacity, EAL5 security,…
- Exploits the latest data for your analytics, leveraging the high qualities of service, security and accelerated performance.
- Brings analytics to where 80% of the transactional data used for analytics originates.
- Is proven to support, real time scoring up to 12K transactions per second.
- Executes complex query execution times up to 2000x faster.
- Eliminates the expense associated with query tuning; lowers the cost of managing and processing historical data, reduces host storage costs by up to 95%; dramatically reduces ETL costs by minimizing data movement; and reduces the cost of analytics infrastructure and administration up to 50%.
- Speeds up the creation or update of predictive models by a factor of 10 or more.
Figure 1 depicts the z Systems strategy – and the rest of this article will briefly explain the key technology elements implementing our strategy and provide some initial proof points that it actually works well.
Figure 1 The z Systems strategy: Integrating operations & analytics in one streamlined, end-to-end data lifecycle
The center piece for all of these technologies is DB2 for z/OS with the integrated IBM DB2 Analytics Accelerator for z/OS, (Accelerator). It is the base for other software products and solutions to implement the integrated transaction/analytics strategy by providing a hybrid database system that serves both operational and analytical applications on a single version of centrally managed data.
The enterprise data warehouse
Online transaction processing (OLTP) has been and continues to be the stronghold for z Systems for a very long time while – following conventional wisdom and due to missing appropriate technology – data warehouses have been created off-platform. The development of the IBM DB2 Analytics Accelerator for z/OS, (Accelerator), has introduced the technology base to keep – or move back – the warehouse to System z.
A number of criteria have to be examined to show the validity of this approach.
First, performance of complex analytical queries and scalability of the database are key requirements for an enterprise data warehouse. The Accelerator has proven that it can run complex queries, such as when creating Cognos reports, up to 2000x faster than a traditional DBMS. This takes down the execution time for reporting applications from hours to minutes or seconds, which not only accelerates existing applications but allows for new, interactive business analytics running ad-hoc analytic queries on current data at any time. The Accelerator also provides the scalability required for even the largest enterprise data warehouses. It has been successfully deployed for databases reaching hundreds of terabytes in size.
Second, adding the new analytic workload to an existing operational environment must not affect the SLAs of transaction processing on the same systems. The Accelerator also delivers on that requirement: complex analytical queries are issued against DB2 z/OS – but the DB2 Optimizer recognizes these queries and transparently offloads the query processing to Accelerator. The impact on the operational workload caused by this is negligible. In a test that models a real-life situation, we ran an operational workload on a zEC12 system with two LPARs with 8 CPUs and 120 GB memory each that was processing 20.5 million transactions per hour. We then added an analytical workload simulating a mix of queries reflecting various Cognos reports, running at 13,000 queries per hour. These queries were issued against DB2, but executed in an attached Accelerator. Our measurements have shown that the operational throughput could be maintained with no additional mainframe capacity required – which is a fantastic result.
Figure 2 Running operational and analytical workloads on the same system
Third, we took a look at the data currency in the warehouse. In the previous scenario the data used for reporting was a snapshot of the operational system that was refreshed in a nightly batch window. The next question was, of course, whether we can keep the data in the warehouse up-to-date – also without impacting the operational workload. In our test, we set up change-data-capture for the operational system and kept the data in the warehouse in sync near real-time, i.e., with only a few minutes delay changes in the operational system got reflected in the data warehouse – ready for analytical processing. Our measurements of this scenario have shown a minimal degradation in transaction throughput of about 3%. If this would not be acceptable, it could be compensated by allocating about 3% more resources to the operational system. Again, this is a fantastic result and demonstrates a unique capability of our approach – a result that would hardly be achievable with an off-platform data warehouse.
Figure 3 Adding an analytical workload to an existing operational workload with data kept in sync near real-time
“In-transaction analytics” means executing advanced analytics in every transaction while preserving SLAs. An example is fraud prevention for payment transactions. The goal is to closely examine each incoming payment request and check for potential fraud – at a rate of hundreds or thousands of transactions per second. This is extremely challenging and typically lots of compromises are made to be able to sustain the transaction rates. For example, instead of using decision management with complex business rules and scoring using predictive analytics models, only very simple, efficient, hand-written rules are applied during actual transaction processing, or a only a subset of the transactions, e.g., those exceeding $100, are examined. The rest of the payments will be analyzed later on in batch processes, potentially resulting in the blocking of a credit card – but at a time when some fraudulent payments have already been authorized and a loss occurred.
It is obvious that being able to perform full fraud detection logic on each and every transaction during payment authorization would generate very significant cost savings. In order to implement this, we need to consider various aspects of the problem:
First, the data required as input parameters to check fraud consists of information from the operational system, e.g., information about the latest five transactions, customer data, and some historical and/or aggregated data typically maintained in the warehouse such as average amount of transactions over the last week/month/quarter, locations the credit card was typically used, etc. If the data warehouse is off-platform, obtaining the latter data requires a connection to the remote database via network which results in latencies measured in tens of milliseconds – which is not feasible when thousands of transactions have to be processed per second. But there is a better solution: if the warehouse runs on DB2 – see the section above –, the operational system can access the data in the warehouse directly from its DB2 instance using data sharing. Latency now goes down to microseconds, which is exactly what we need.
Second, we need to calculate one or more scores for fraud models using the input data. Again, we could call a scoring service in a distributed environment, such as the real-time scoring service provided by SPSS C&DS which can be called through a Web-services interface from within a DB2 transaction. The latency of such a scoring call again is in the range of tens of milliseconds – and therefore hardly suitable to score thousands of transactions per second. Also, it can be quite difficult to set up a distributed scoring service in a way that it smoothly scales with the transaction workload.
A better approach is to use the SPSS Scoring Adapter for DB2 z/OS. The scoring adapter provides a DB2 UDF (user-defined function HUMSPSS.SCORE_COMPONENT) that can be used anywhere in a DB2 transaction. Instead of calling an external scoring service, this UDF calls the SPSS scoring algorithm running locally inside DB2 z/OS. Scores are executed 10-100x faster than making distributed system calls to remote scoring engines. Also, the usage of the scoring UDF scales nicely with the DB2 system without any special effort.
Figure 4 shows the results of a throughput test for the SPSS Scoring Adapter for DB2 z/OS. The numbers do not reflect full payment transactions, but only the calls of the DB2 scoring UDF itself. The execution time for performing a single score is significantly less than 1 millisecond, scaling is almost linear – 8 zEC12 CPUs can calculate more than 13.000 scores per second.
Figure 4 SPSS Scoring adapter for DB2 z/OS performance results
An interesting fact is that when we compared the CPU cost for the scoring UDF with the CPU cost of calling an external scoring service through a Web-services interface, we found that the CPU cost for processing the protocol stack (TCP/IP, Web-services,…) is higher than the cost of native scoring. Interesting – saving MIPS by on-loading functionality.
We were able to show that using this technology, analytic scoring can be done within the scope of an OLTP transaction with negligible impact to SLAs while operating on current data (last committed data).
Real-time scoring based on the SPSS Scoring Adapter for DB2 z/OS is an integral part of a number of available System z business solutions:
- IBM System z Smarter Analytics for Banking - anti-fraud and anti-money laundering focus.
- IBM Signature Solution - anti-fraud, waste and abuse for Healthcare and Insurance.
- IBM Signature Solution - anti-fraud, waste and abuse for Tax.
- IBM Signature Solution for Next Best Action.
In-database transformation in the accelerator
To satisfy different workload needs, OLTP source systems data and other application systems data often had to be accessed, cleansed, transformed, and loaded into the data warehouse system. Processing steps required to transform data from operational to reporting-optimized structures are referred to as extract, transform, and load (ETL) and extract, load, and transform (ELT) processes. The difference between ETL and ELT is about the system actually performing these transformations. Within ETL processes, the transformation is performed with specific transformation tools at a system that can either be the source system or a specialized transformation system.
The ELT processes transform data into target structures within the target system. Inserting transformed data into a DB2 for z/OS table while selecting data from another system is considered to be the transformation step of an ELT process where the transformation is applied at the target system. Transforming data within a RDBMS is referred to as in-database transformation.
The ETL process to move data to a different system, such as using SQL to move and transform data can be very expensive and CPU intensive. Also the data transformation, movement, maintenance among different subsystems can also cause several other problems like:
- Significant complexity: Data is move from operational databases to separated data warehouses/data marts to support analytics
- Analytics latency: Transactional data is not readily or easily available for analytics when created
- Lack of synchronization: Data is not easily aggregated and users are not assured they have access to “fresh” data
- Data duplication: Multiple copies of the same data is proliferated throughout the organization difficult to document data lineage
- Excessive costs: An IT infrastructure that was not designed nor can support real-time analytics
DB2 Analytics Accelerator enables DB2 for z/OS to perform these data transformations in an efficient way by using accelerator-only tables. Using accelerator-only tables, you can now perform SQL data transformations completely on the accelerator, reduce the need of move data across different systems, therefore greatly simplify the data transformation/movement process and address the above listed issues.
The accelerator-only table (AOT) is a new table type. The data of AOT exists only in the accelerator, not in DB2. The user can use DB2 Insert (both from VALUES clause and from Subselect), Update, and Delete statement to populate the data to the accelerator. With AOT, the data of intermediate tables generated by the ELT logic are populated and exists directly within the accelerator and without the need for bring back to DB2. It can be subsequently used for the remaining transforming statements. The whole ELT logic can run in the accelerator, as shown in Figure 5.
Figure 5 In accelerator ELT process
Figure 6 shows class 2 elapsed times for inserting 1, 10, and 100 million rows through subselect into declared global temporary tables (CL2EL_DGTT) and partition-by-growth (CL2EL_PBG) tables within DB2
for z/OS and AOT tables within an accelerator (CL2EL_IDT).
Figure 6 Class 2 elapsed time for inserting rows into different target table
Figure 7 shows the CPU differences.
Figure 7 Class 2 CPU time for inserting rows into different target table
In this section we examine “data mining”, the analysis of vast amounts of data in order to derive business insight or predictive models that can be applied in operational processes. For example, the analysis of billions of payment transactions generates models that can be applied in a payment authorization system to predict the likelihood of a single, new payment transaction to be fraudulent.
While the capabilities described so far are all available today on the System z platform, the technology evaluated here is not generally available yet.
With today’s technology, a data analyst would typically access specifically prepared data in a data mart to perform data analysis and modelling. The data mart is generated from the enterprise data warehouse containing a subset of the data relevant to the analytical task. It represents another island of information and often limits the data analyst, for example, to a subset of the available data – one year of payment transaction instead of all, or only limited information that has been selected a-priori to be potentially relevant for the task. Also the data is usually not up-to-date. It would be beneficial for the quality of the predictive models if the data analyst had access to all data in the warehouse to consider it in the data mining process. Another problem is that the resulting models rely on input data according to the data model in the data mart. This data model may not be available to the operational system in order to apply the model (“scoring”) in transaction processing.
Accelerator-only table and Analytics stored procedure support that is available in V5 will enable in-database analytics with significant performance advantages. IBM DB2 Analytics Accelerator V5 supports a subset of existing analytical capabilities that are well-known from IBM Pure Data for Analytics appliances. The analytical capabilities added to the Accelerator are provided by additional DB2 for z/OS stored procedures. These DB2 for z/OS stored procedures act as a wrapper to call other stored procedures that are available on the Accelerator only. These Accelerator stored procedures perform analytical functions in the accelerator only.
SPSS Modeler 17 now can then use SQL generation in combination with in-database modeling which results in analytics streams that can be run from start to finish in the database, resulting in more than 10x performance gains over streams run in SPSS Modeler. With these capabilities the limitations above disappear – resulting in better models that can be directly applied in operational systems.
We have investigated the performance of in-database analytics and compared two implementations of the same analytics stream:
- (New) SPSS in-database analytics for DB2 z/OS using a KMEANS clustering algorithm running inside the Accelerator with all data preparation also performed inside the Accelerator.
- (Today) the same stream using the SPSS KMEANS clustering algorithm running in SPSS Modeler server with data preparation via SQL pushback to DB2 z/OS accelerated by the Accelerator.
Figure 8 Simple SPSS analytics stream for performance test
The analytics stream was very simple – just a few data source nodes and join operations. In reality, data preparation will be far more complex. The benefits of in-database processing will be significantly higher then. Figure 5 shows our simple stream in the SPSS Modeler GUI.
We then ran the stream in both variants – in-database vs. in-Modeler – with different input data sizes. Figure 9 shows the results.
Figure 9 Performance results for in-database analytics
Our key results were:
- Both approaches scale linearly, but the in-database version has a much lower gradient.
- The parallel implementation of the in-database solution has some initial overhead which makes it slower for very small data sizes. But starting at only 1 million input records, the in-database solution is faster end-to-end.
- With 20 million input records – which is still a “small” data size in many typical applications – the in-database solution performs 10x faster end-to-end. The data preparation part is 25x faster and the clustering algorithm is 6x faster in this case.
In a “first-of-a-kind” project with a customer, we applied this technology to their original data and their current analytical streams. In that project, the results were even better: we saw up to 250x performance gain in data preparation and 20x for the mining algorithm.
In-database analytics is a key piece in our integrated transaction and analytics processing strategy. The initial results are extremely promising and we expect great benefits for customers adopting this technology once it will be available.
Five key takeaways
- Many organizations are trying to deliver instantaneous, on-demand customer service with IT systems designed to provide after-the-fact intelligence.
- Achieving insight with every transaction demands a holistic implementation of an integrated data lifecycle with business-critical systems.
- z Systems has the vision, strategy and technology to fuse transactions and analytics by eliminating the latency and complexity pitfalls that develop with a distributed approach.
- z Systems "operational analytics" builds advanced decision management support on this integrated data platform injecting intelligence into operations without sacrificing performance.
- Truly transformational business opportunities require truly transformational infrastructure - and that infrastructure is z Systems.
- Paul DiMarzio, Jeffrey Frey: System z Transaction and Analytics Processing, IBM Systems and Technology Analyst Insight, 09/2014
- Dan Wardman: Big Data and Analytics on System z for Proven Competitive Advantage, 09/2014
- Gartner Inc.: Magic Quadrant for Operational Database Management Systems, G00261660, 2014/10/16
- Gartner Inc.: Hybrid Transaction/Analytical Processing Will Foster Opportunities for Dramatic Business Innovation, G00259033, 2014/01/28