Introduction
For the past 30 years, our focus as DB2 for z/OS database administrators has been on the support of COBOL programs with static queries that were relatively easy to control as they were identical throughout the business logic. Static SQL provided persisted access paths in the database system and the only thing we had to do was to make sure that these access paths were apt for all the situations the queries were used in.
In the past decade a silent, but nevertheless in-depth revolution is taking place, as step by step object-oriented Java (and .NET) based applications are replacing the old COBOL programs. These applications do not rely on static SQL, but on dynamic statements that are generated by rather complex frameworks, that along with the larger usage of Java evolved also rapidly.
In this article, a quick overview of the basics of Java SQL performance on DB2 for z/OS is presented. The idea is to give insight in the mechanisms driving dynamic SQL coding in Java.
The article will not, however, discuss the odds of improving performance of dynamic SQL in general as a lot of documentation has been written already with that regard. I will describe how the code generation relates to the findings with this regard and how the performance can be improved by considering the correct execution properties.
Object Relational Mapping
What is it?
Object Relational Mapping or ORM may sound familiar to you, as it has been a hot topic in the past decades. But do you know what it really means?
First, it was seen by the Java community as a magic wand to provide a Java API to relational data, next, it was judged as being evil by the RDBMS community as the first versions of the frameworks generated SQL that wasn’t apt for large production systems.
In reality, a framework as an ORM is something that you need to tune according to your needs as are all frameworks. The ORM is just a means that allows easy communication between the Java hierarchical object logic and the database relational logic. The mapping can be done in many ways. Which option you choose will depend on both your application logic as the underlying relational data that is being targeted.
Object-Relational Mapping Technology
Domain based SQL mapping or SQL based object-relational mapping
In so-called ‘domain based object-relational mapping’ Java objects are mapped to the relational database model, starting from the hierarchical object model used within an application. A domain based ORM will provide a domain based query language that will allow you to link the hierarchical logic to a relational one and to generate queries that are apt for your application logic.
Common examples of domain based ORM frameworks are Hibernate, OpenJPA, EclipseLink. Examples of domain based query languages, or domain-specific languages, include jpaQL, HQL and QueryDsl.
In ‘SQL based object-relational mapping’ Java objects are linked to parts of SQL queries. The framework builds SQL using the building blocks provided by the developer.
A common example of SQL based ORM is iBatis/myBatis.
SQL based ORM is far less popular than domain based object-relational mapping, as it assumes a connection to relational data, whereas the domain based ORM starts from Java logic and can hence address any type of underlying data (which is actually done in the larger framework JDO which is a superset of JPA). This is very valuable in the context of a Service Oriented Architecture in which ‘loose coupling’[1] is assumed.
This article will focus on the logic that is held within domain based ORM, as it is the most popular ORM type.
Fetching strategies
JPA is a technology based on what was originally developed by Hibernate. As such, Hibernate is one of the best examples to describe how JPA logic is built. This doesn’t mean that there could be reasons to choose for other ORM frameworks.
In this article the focus is on Hibernate, though a few remarks on openJPA and EclipseLink will be made. Unfortunately, it is impossible to discuss all the available frameworks, but most of them will use an identical or similar methodology to map domain objects to the relational database schema.
The latest incarnation of Hibernate includes ten different fetching strategies that address two dimensions.
The first dimension is how we want to fetch detail data from the database. The main entity is obviously fetched with a select, but the complexity is assumed when you need to fetch more details that are included in the domain model in the same hierarchy, but persisted in different tables in the relational database.
In Hibernate, you can choose to fetch these details as following:
- with an outer join of the main and the detail table
(= join fetching)
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
SQL:
Select Persons.Name,Persons.Date_of_Birth,Cities.Name,Cities.PostalCode,Cities.Country
From Persons
outer join Cities on Persons. Place_of_Birth=Cities.Name;
- with a second select on the detail table, next to the select on the main table
(= select fetching)
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
SQL:
Select Persons.Name,Persons.Date_of_Birth
From Persons;
Select Cities.Name,Cities.PostalCode,Cities.Country
From Cities;
- using a subselect associating the main table data to the detail table data
(= subselect fetching)
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
SQL:
Select Persons.Name,Persons.Date_of_Birth
From Persons;
Select Cities.Name,Cities.PostalCode,Cities.Country ;
From Cities
Where exists (Select 1 from Persons where Persons.Place_of_Birth = Cities.Name);
- using a single select to retrieve full details from a large quantity of tables (= Batch fetching)
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
SQL:
Select Persons.Name,Persons.Date_of_Birth,Cities.Name,Cities.PostalCode,Cities.Country
From Persons
inner join Cities on Persons. Place_of_Birth=Cities.Name;
A second dimension is when we want to fetch the detail documented in Hibernate or our favorite ORM:
- Fetch all details (all child nodes) for the object we are targeting in the hierarchical object model.
(= immediate fetching)
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
SQL:
[Java: method: getName()]
Select Persons.Name,Persons.Date_of_Birth
From Persons;
Select Cities.Name,Cities.PostalCode,Cities.Country
From Cities;
[Java: method: getCity()] -- Data already in variables
- Fetch the detail of the child node related to the object we target whenever we request in the application.
(= lazy collection fetching (default behavior))
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
Child Entities: Person{Name,DateOfBirth}, Cities{City,PostalCode,Country}
SQL:
[Java: method: getName()]
Select Persons.Name,Persons.Date_of_Birth
From Persons;
[Java: method: getCity()]
Select Cities.Name,Cities.PostalCode,Cities.Country
From Cities
Inner Join Persons on Cities.Name=Persons.Place_of_Birth;
- Only fetch the exact attributes of the child node detail whenever requested by the application.
(= extra lazy collection fetching)
Java:
entity: People {Name, DateOfBirth, City, PostalCode, Country}
Child Entities: Person{Name,DateOfBirth}, Cities{City,PostalCode,Country}
SQL:
[Java: method: getName()]
Select Persons.Name
From Persons;
[Java: method: getCity()]
Select Cities.Name
From Cities
Inner Join Persons on Cities.Name=Persons.Place_of_Birth;
- Fetch details of an object that is associated to a method other than the one it is identified with[2] upon invocation of that method
(= proxy fetching)
Java:
entities:
People {Name, DateOfBirth, City, PostalCode, Country}
Nationalities {Country,CountOfEmployees}
[This entity is based upon the data of the People entity]
SQL:
[Java: method: getNationalities()]
Select Cities.Country, count(*)
From Persons
Inner join Cities on Cities.Name=Persons.Place_of_Birth
Group by Cities.Country ;
Details of a single-value association are fetched when the instance variable is accessed, whatever the method being used
(= no-proxy fetching)
Java:
entities:
People {Name, DateOfBirth, City, PostalCode, Country}
Nationalities {Country,CountOfEmployees}
[This entity is based upon the data of the People entity]
SQL:
[Java: method: getNationalities()]
Select Cities.Country, count(*)
From Persons
Inner join Cities on Cities.Name=Persons.Place_of_Birth
Group by Cities.Country ;
Select Persons.Name,Persons.Date_of_Birth
From Persons;
Select Cities.Name,Cities.PostalCode,Cities.Country
From Cities;
[Java: method: getCount()] -- Data already in variables
- Details of an attribute or single-value association are fetched when the instance variable is accessed, whatever the method being used
(= lazy attribute fetching)
Java:
entities:
People {Name, DateOfBirth, City, PostalCode, Country}
Nationalities {Country,CountOfEmployees}
[This entity is based upon the data of the People entity]
SQL:
[Java: method: getCountry() -- for Nationalities entity]
Select Cities.Country
From Persons
Inner join Cities on Cities.Name=Persons.Place_of_Birth
Group by Cities.Country ;
[Java: method: getCount()]
Select Cities.Country, count(*)
From Persons
Inner join Cities on Cities.Name=Persons.Place_of_Birth
Group by Cities.Country ;
In openJPA, the implementation is much simpler: you choose how eager you want to fetch your data, not eager (eager fetching mode ‘none’), with joins (eager fetching mode ‘join’) or in parallel (eager fetching mode ‘parallel’). You can achieve more or less the same types of fetching strategies as in Hibernate by specifying a fetching mode and a subclass fetching mode. For example, you can choose to fetch in parallel with Join as a subclass, this means you emulate Hibernate’s lazy collection fetching.
What do you need to consider?
With a ‘lazy’ fetching strategy, a program will generate queries at the moment a given entity or attribute in the database is accessed; in terms of Java we would say the ORM fetches a single node of the object model’s hierarchy. This should result in a series of simple SQL queries if your database is well designed. Use this strategy when you need the detail only occasionally, as you will avoid fetching unnecessary data from the database.
With an ‘eager ‘ fetching strategy, a program will generate queries with joins that fetch all details for your java object up to the degree you specify. For example, if you fetch an object eagerly with degree 3, you will generate a query fetching all entities related to the java object, its children and its grandchildren in the object’s model hierarchy. Use this strategy when you need this data throughout your application logic, this will avoid the need to issue heavy queries each time you need to make use of the detail.
A ‘batch’ fetching strategy, referring to the first dimension, will generate a series of simple queries rather than a join. This strategy will generate more SQL than when using a ‘join’ strategy, but the queries will be less complex. Use a batch fetching strategy when you need data from many entities; a join strategy should be considered when the level of complexity is acceptable. Hibernate adds the option to explicitly request subselects, allowing you to accurately control how the SQL statements are generated.
With this regard, the usage of the strategies imply the same considerations as before when we wrote the SQL statements ourselves. However, a DBA needs to look a little further than this first impression.
The complexity of queries will make SQL statements faster or slower. On one hand, a developer needs to choose carefully in how many degrees he wants to do joins; on the other hand, when joins are split in many separate queries, the cache will need to be very large to maintain the large number of prepared statements. Alternatively, reducing the number of joins could generate more identical queries up to a certain degree; this could have the side effect that cache usage is optimized instead of deteriorated. Finding a balance in the strategies is hence key to good SQL performance.
In the first place, this is a task for an application developer who should know his business model very well, but on the other side, knowing how these dimensions work will allow the database administrator (or should I say data architect) to evaluate how the source code could be improved when incorrect SQL patterns appear.
The footprint of the strategies in use can be found in the dynamic statements issued by a Java application. It is a good practice to monitor the SQL statements and to identify these patterns for applications using this type of technology.
How does this relate to DB2 for z/OS performance in particular?
DB2 for z/OS holds Enterprise Data
Achieving quality in the generated SQL depends not only on how the logic of an application is written, but also on the way the database is modeled. This is the first point where DB2 for z/OS Database Administrators need to pay special attention.
In many distributed environments, applications rely on small dedicated databases that only address the data for a given domain. The mapping in that situation is rather simple and its quality is not as important as on DB2 for z/OS as no common resources are endangered.
The concept of an enterprise database, on the other side, is something that is linked by its nature to DB2 for z/OS although not exclusively. DB2 for z/OS has always been a central repository for data all across the enterprise. Seldom is DB2 for z/OS used to host a large series of independent data domains with no relations in between; more often DB2 for z/OS hosts data from a wide variety of enterprise domains and allows the business to correlate data (e.g., the financial data to the HR system and the productivity databases).
Although DB2 for z/OS does not serve the complete needs of the enterprise any longer (given the evolution to network databases and big data), the enterprise logic still applies as DB2 remains one of the primary sources of structured enterprise data.
Enterprise data structures survive application processes
Also by nature, enterprise data structures survive application logic. The data structure that is stored in the enterprise will evolve, but much slower than the processes that make use of it. A computer, for example, will remain a computer in many aspects even if it has become a smart phone or a laptop; the way you use the devices, however, will significantly differ. Hence it is vital that enterprise data modeling be handled separately from the application data interface modeling.
When domain data is modeled properly in the context of the enterprise with respect for the relational logic, the application API can be modeled as a logical layer upon this data implementation by means of views, aliases, stored procedures, etc.
In this logical layer, application API tuning can be done by the application DBA. This is crucial to the performance of the SQL generated by the ORM. Thanks to logical application layers the quality of the actual SQL issued upon the database can be optimized by applying the correct API filters.
A good example of this kind of logical layering is the possible application of the ‘specialization’ and ‘generalization’ data modeling schemes within the database along with views offering a denormalized picture of the data. With specialization attributes for a primary key can be split into multiple groups. For example, a car can have technical specifications but also sales information. Generalization offers the reverse modeling by taking a common set of attributes out of two or more tables and placing them in a central table with the same primary key for each of these tables.
The business application may regard this dataset as a single object, but within the relational model this would cause many data fields to be null. If we model the data thus that we have a parent table with common attributes and two child tables with each a set of attributes, we can build a view upon these tables providing a complete overview.
When correctly tuned, Hibernate will select the attributes as required within the application logic and the optimizer will be able to transform the query with its aggressive merge capabilities (see the DB2 10 for z/OS Performance Topics Redbook) and filter the data more effectively.
Alternatively, you can map the objects in your application differently by targeting the individual tables as children of a parent entity when a single detail type is often required. Obviously, less complicated queries could appear in this case. This will probably be the option that will provide you with the greatest performance gain.
This optimization is something you can do without changing the application code. It requires tuning of both the database and the Hibernate configuration only. The usage of aliases in your database can provide additional flexibility to this reconfiguration.
‘Generalization’ and ‘Specialization’ appear also in the context of inheritance and Java objects[3]; this resemblance in data modeling of the hierarchical and the relational model provides the DBA indications that can be considered to align the database schema in a similar way.
Of course each situation needs to be assessed individually, but if this is done with care, eventually the automated mapping methods will provide improved SQL and the costs of running Java queries on DB2 for z/OS will decrease accordingly.
Conclusion
To achieve the best performance possible on DB2 for z/OS with a standard Java program using a domain-based ORM, one should embrace the technology and learn about its aspects. This will allow easier conversations between developers and DBA. Although the main responsibility of the ORM configuration will lay with the program developer, the DBA should monitor to uncover aberrant usage of these configurations.
It is also vital that a DBA has in-depth knowledge of the business domains his database is related to. First, he needs to model this domain data within a relational context, making abstraction of the processes that will make use of it. Next, the DBA should match that purely relational model to the processes that are applied upon them. Ultimately, a clear distinction between a logical layer and a physical layer should be accomplished to provide the flexibility that is required to support the processes that change relatively fast.
The quality of the generated SQL code remains a central asset, but the final control on how access is done remains in the hands of the DB2 for z/OS DBA who is a specialist of his platform. A logical application API in the database provides the DBA additional means to get a grip on what is generated by the Java application.
To complete a competitive and performing Java implementation, the DBA adds a valid JDBC property configuration on the data source used by the application and he takes into account the many aspects of dynamic SQL performance tuning as described in the literature.
[1] Loose coupling doesn’t mean that the application should be strictly segregated from its underlying technology, as this is virtually impossible. However, loose coupling does assume that you have done sufficient efforts to make sure that you can easily switch from one technology to another one without impacting too many application components.
[2] In Java objects, entities are related to so-called get methods (e.g., if you need people information, a method getPeopleInfo will retrieve the data related to the PeopleInfo object). This get method is the method that identifies that object.