ORM, Hibernate and JPA

Business data is usually stored in relational databases, where it persists for use by enterprise applications. Applications use SQL, typically embedded within a programming language, to store and retrieve the data.

Increasingly popular object-oriented languages like Java support relational database access, however, rows of data from tables are different then the web of objects comprising an OO application. An object's fields must be mapped to a table's columns. Objects also participate in relationships with other objects. These relationships provide many of the advantages that OO applications have over applications written in traditional languages (improved maintainability easier expandability) but also add complexity to code which maps database data to Java objects. Object identity and table identity may also differ, raising issues.

Overcoming the object/relational paradigm mismatch has a cost. Much Java Database Connectivity (JDBC) code is often required to bridge the gap manually. Object/Relational Mapping (ORM) automates the persistence of Java objects to relational tables. It uses metadata to describe the mapping of Java classes to database tables. It transparently transforms data from one representation to the other.

An ORM solution specifies mapping metadata, performs CRUD (create/read/update/delete) on objects of persistent classes, supports queries that refer to classes and their properties and provides transaction support and optimization capabilities. Problems to be addressed by ORM include:

  1. Persistent class design 
  2. Definition of classes and object lifecycle
  3. Definition of mapping metadata
  4. Mapping of class inheritance hierarchies
  5. Determining object identity and object equality
  6. Runtime interaction between persistence logic and business objects
  7. Facilities for searching, sorting, and aggregating
  8. Efficiency, especially when using joins to navigate an object graph

ORM eliminates the tedious work of persistence-related code, allowing developers to concentrate on the business problems. It also reduces the amount of code required, making a system more understandable and easier to refactor. It focuses developers on business logic rather than plumbing. It often improves performance, providing many performance optimizations with simple property settings. Equal performance is possible without ORM but requires greater developer expertise and additional development time. Additionally, ORM insulates the application from the underlying database. ORM implementations support many DBMSs thereby simplifying the task of changing database vendors.

Hibernate is a popular, open source ORM. It maps Java classes to database tables and maps Java data types to SQL data types. It provides data query and retrieval facilities that generate the SQL calls, relieving developers from manually converting result sets to objects and delivering database portability with very little performance overhead. Hibernate became available through JBoss in 2003. Since Version 3.2, Hibernate has been a certified implementation of JPA (Java Persistence API), the Java standard approach to ORM adopted as part of Java Enterprise Edition (JEE) 5. Hibernate includes an Interceptor/Callback architecture, user defined filters, and JDK 5.0 annotations (Java's metadata feature). Other popular ORM implementations include Oracle TopLink, OpenJPA and EclipseLink.

Java Persistence API (JPA) is part of the Java EE 5 and Enterprise Java Beans (EJB) 3.0 specification, replacing entity beans, seen as being too heavyweight and complicated. The specification requires JPA engines to be pluggable and to be able to run outside of a Java EE environment (EJB container). Hibernate implements the JPA specification and provides numerous extensions as well. 

JPA 2.0 was approved as final in December of 2009, adding features that were present in some of the popular ORM vendor offerings but unable to gain approval for JPA 1.0. These include expanded ORM functionality such as support for collections of embedded objects, multiple levels of embedded objects, ordered lists and combinations of access types; a criteria query API, standardization of query hints; standardization of additional metadata to support DDL generation; and support for validation.

Hibernate Architecture

The heart of an ORM is the mapping of objects to the database. Hibernate traditionally has used an XML mapping document to support this. As Hibernate has evolved, support via annotations has been added. Annotations can replace or augment the XML mapping document. The primary task of either form of configuration is to map classes to tables and properties to columns. It also specifies unique identifiers and relationships. 

Hibernate depends on database tables to store persistent data. Hibernate applications define persistent classes that are mapped to them. The Hibernate Session (called EntityManager in JPA) is the persistence manager. It manages a collection of loaded objects relating to a single unit of work. Hibernate also offers some useful optional APIs: Transaction abstracts the underlying transaction implementation. Query and Criteria support execution of database queries.

Hibernate can be deployed into a managed environment where a Java EE application server like IBM's WebSphere Application Server (WAS) contains session beans or message-driven beans which may use Hibernate. Hibernate integrates with container-managed transactions and DataSources (connection pools). The Java Transaction API (JTA) transaction manager enlists and controls JDBC connections.

Hibernate can also be deployed into non-managed environments. A servlet container like Tomcat may be used for this. A desktop application also constitutes a non-managed environment. Without container-managed transactions and DataSources available, the application must manage database connections and transaction boundaries itself.

Hibernate supports transparent automated persistence, including automatic dirty checking to determine whether updates are necessary for managed objects. This provides a complete separation of concerns between persistence classes and persistence logic. Hibernate does not send the SQL it generates to the database until a commit is performed. This allows it to combine all updates in the unit of work into a minimal set of SQL statements. It also minimizes lock duration since the updates are immediately followed by a commit.

Persistence classes are unaware of the persistence mechanism. No code-level dependency nor superclasses or interfaces, are required. Persistence classes may be used outside of the persistence context. Persistence classes are implemented as Plain Old Java Objects (POJOs) - simple lightweight classes similar to JavaBeans. These features make Hibernate applications more readable, more portable, and more testable.

Object Identity

Database identity relates to primary key values. Each row has a unique primary key value. In Java, identical objects reside in the same memory space. Objects may also be considered equal based on code evaluating their state. Hibernate allows the definition of an identifier property and supports numerous built-in identifier generator strategies, including database identity columns and sequences as well as providing the ability to create a custom identifier generator.

Composition and Inheritance

Objects are often made up of other objects. Hibernate provides components to support composition, a user-defined class that is persisted to the same table as the owning entity. Relational databases do not support inheritance directly. Hibernate allows inheritance hierarchies to be represented in three ways:

  1. One table per concrete class – ignores inheritance and polymorphism
  2. One table per hierarchy – enables polymorphism through denormalization
  3. One table per subclass – uses foreign keys to represent inheritance

Persistence Lifecycle and Transitive Persistence

The Session is the persistence manager in a Hibernate application (EntityManager in JPA). It is responsible for retrieving a graph of objects while minimizing database hits. It controls the saving of persistent objects when changes occur to them (automatic dirty checking). Hibernate's persistence mechanism is transparent to persistent classes, allowing Hibernate applications to avoid requesting database updates directly. Calls to Hibernate via the Session are used to propagate objects' state in memory to the database.

Objects typically have relationships to other objects. A set of related objects is known as an object graph or web of objects. Transitive persistence propagates updates to subgraph objects automatically. Cascading persistence, supported by Hibernate and JPA, allows the specification of a cascade style for each association mapping, offering flexibility and fine-grained control and enabling transitive persistence.

Hibernate Query Language (HQL) and JPA Query Language (JPQL)

Hibernate Query Language (HQL) only supports data retrieval, unlike SQL which supports update, insert, delete and much more. Its syntax implies a select *, although column specification is supported. All of the other standard features of SQL are supported - filtering with a where clause including all SQL operators, ordering, grouping, having, aggregate functions and many other functions, various types of joins including outer joins, subqueries including correlated subqueries and quantifiers (ANY, ALL), as well as positional and named parameters. Queries can also be named and defined in mapping files, becoming reusable via Session's getNamedQuery method.

JPA Query Language (JPQL) is a standardized subset of HQL. JPQL is part of the Java EE5 and EJB3 standard. HQL is not. How does JPQL differ from HQL? JPQL requires the query to include the Select portion. HQL does not. Although JPQL and HQL support both support about a dozen standard functions, HQL supports many additional functions beyond the standard ones. HQL supports a syntax shortcut for subqueries. HQL has a well-defined caching strategy. JPA does not. Hibernate is a more mature product; JPA is a relatively newer. 

Hibernate and JPA 2.0 both support Criteria objects, providing an alternative to the Query object used to request database rows. Criteria use an object oriented approach (method calls) to build database requests rather than the more SQL-like approach of HQL and JPQL. In addition, native SQL syntax can be used in both Hibernate and JPA. Native SQL provides a mechanism to feed hints to the optimizer. HQL, JPQL and Criteria have limited support for optimizer hints.

Transactions

Transactions allow multiple related updates to operate as a single unit of work – all succeeding together or all failing if any one fails. In JDBC, setting AutoCommit to false on the JDBC Connection is required to enable proper transactional processing. This is done automatically by Hibernate as soon as it gets a connection.

Multiple DBMSs may participate in a single unit of work. The Java Transaction API (JTA) supports distributed transactions. Hibernate uses a JDBC Connection to communicate with the database. If Hibernate is being managed in an application server, JTA may be used. Hibernate application code is the same in both environments. The Hibernate Transaction hides the underlying transaction API.

Concurrency and Isolation Levels

Locking is required to ensure that concurrent applications do not damage data integrity or produce erroneous results. Isolation level indicates how long a lock will be held in order to address issues including lost updates, phantom read, dirty read and unrepeatable read.

The tradeoff is between too much isolation which can harm performance and too little which can lead to application issues. Hibernate connection isolation configuration options are applied to every connection obtained from the pool prior to starting a transaction. Hibernate also supports requesting a pessimistic lock on a get request, which can cause a retrieved row to be locked and the lock held until the commit point. Hibernate LockMode options cause a read lock to be obtained if it is necessary to actually read the state from the database, rather than pull it from a cache. The LockMode applies at the session level to all database requests.

Caching

A cache is a local copy of the data stored in memory, used to avoid a trip to the database under certain circumstances. Hibernate employs a first-level cache and a second-level cache. First-level cache is scoped to the transaction and is managed by the Session. It cannot be turned off. It is used for save(), update(), load(), list(), and other methods of Session.

Second-level cache is scoped to the server (or cluster). It is optional, pluggable, and configurable. Support for various cache providers is built into Hibernate. Cache providers supports fine-grained configuration at the class or collection level. The mapping configuration defines cache regions. Second level cache poses a risk if legacy applications may be concurrently updating the same data as the Hibernate application, and a benefit for relatively static or non-critical data. For mass updates and deletes, Query supports an executeUpdate method, which bypasses all caching.

Object Retrieval Efficiency – Fetching Strategies

Hibernate executes SQL select statements to load objects into memory. Populating object graphs often requires access to multiple tables. A fetching strategy is employed to minimize the number of SQL statements and to simplify the SQL to optimize performance. Hibernate offers several fetching strategies:

1. Immediate fetching entails sequential database read (or cache lookup). It is the least efficient strategy, unless a requested object is likely to be in cache

2. Lazy fetching causes retrieval upon first access via database request (or cache lookup). This is the default, recommended approach for optimal performance in most cases. It can be overridden in the mapping configuration.

3. Eager fetching causes associated objects to be retrieved along with the owning object. It makes a single request using an outer join and may be beneficial in some scenarios.

4. Batch fetching causes retrieval of a batch of objects when a lazy association is accessed. It is specified via the batch-size attribute in the class element in mapping document.

FetchMode can also be set in code on a Criteria object. Hibernate has a global configuration option that controls the number of outer-joined tables that will be used in a single query.

Optimization Techniques

There are several considerations for tuning object retrieval operations. It is useful to enable the Hibernate SQL log and review the generated SQL. Fetching strategies can be tuned to reduce number and complexity of SQL queries. Complex outer joins can be tuned with the Hibernate's max_fetch_depth option. The number of SQL statements executed can be reduced by turning off lazy fetching, replacing with one of the eager or batch fetching options.

Regarding object modification, Hibernate configuration can be used to specify SQL statements to support insert, update and delete operations, overriding the default statements generated by Hibernate. Hibernate also allows you to define named queries that call stored procedures, even handling ResultSets they may return. 

Transitive persistence involves the propagation of changes to the entire network of objects related to the one being saved. Hibernate and JPA support numerous cascade options that allow them to determine whether associated objects are to be saved and when. These options apply to inserts, updates, and deletes, and may be used in combination, providing numerous alternatives for tuning.

HQL and JPQL support bulk update and delete operations, bypassing the caching that adds overhead but not benefit in this scenario. HQL additionally supports bulk insert using a subselect and batching updates based on cursor processing. Performing a batch of inserts in a loop or using stored procedures provide additional alternatives.

Conclusion

ORM automates the persistence of Java objects to relational tables, simplifies development, optimizes performance and isolates database interactions. Hibernate is a mature product that provides a very complete implementation of ORM. JPA provides a large subset of Hibernate's capabilities. Automatic dirty checking is used to generate SQL at commit time, minimizing or eliminating the need to code SQL. Persistent classes, implemented as simple POJOs and configured via XML mapping files or annotations, map fields in Java classes to columns in relational database tables as well as mapping class relationships, simplifying access to and update of persistent data in the database. Hibernate supports several transaction mechanisms, including distributed transactions, and minimizes lock duration by holding updates until a commit is issued. The Query object, available in Hibernate and JPA, supports HQL / JPQL as well as native SQL, which allows for optimizer hints.

Both frameworks provide a host of benefits associated with ORM. Hibernate is more mature and capable. JPA is included in the Java standard and continues to improve in its capabilities. Since Hibernate supports the JPA standard as well, it provides the best of both worlds. TopLink, EclipseLink, and Apache’s OpenJPA are other popular alternative implementations of the JPA specification, but Hibernate is the clear industry leader.

 

Recent Stories
SQL Query Writing Tips

ORM, Hibernate and JPA

Simple, Highly Scalable and Distributed Query Processing with IBM Queryplex