Why pureXML Still Matters

It is pretty clear that one of the key technologies of the moment is JSON. And this is reflected in many of the latest advances associated with DB2. Among the JSON-related developments in the DB2 space we have seen the RESTful web service support in DB2 for z/OS, the IBM Data Server Gateway for OData and the recent resurgence in JSON support in DB2 for LUW. And if you read large parts of the trade press you’d think that XML had been consigned to the history books : XML is “old and boring” and if you are a “cool kid” then JSON is all you need.

So, if you have spent time acquiring pureXML skills has all the effort been worthwhile?  And if you haven’t yet gotten into XML, is there any point in learning it now? If you are involved in discussions with IBM on the future development of DB2, for example via a DB2 Technical Advisory Board or Customer Advisory Council, would there be any sense in asking IBM to fill in the remaining gaps in their support for the various XML standards?   My answer to all three questions would be “YES” – XML still has a massive part to play and the capability of DB2 to support it will be critical for a long time to come. Let me try to explain why I believe this is the case.

JSON and XML are not just slightly different ways of doing the same thing. This is a common misunderstanding, probably caused by the superficial similarities between the text representation of JSON and XML. Folks who have this viewpoint normally then go on to point out that the JSON representation has less characters than the XML representation, and therefore JSON is better. There is a lot more to XML than just this textual representation, and only when you understand this will you start to realise its power. Here are some important aspects of XML which bear thinking about –

  • XML Schemas – the ability to validate that XML conforms to a certain structure is useful in a wide variety of situations. In the simplest form, simply being able to check that the XML has arrived undamaged is a benefit. But much more significant is the use of XML Schema as a means of defining data interchange standards which all providers and consumers of services can follow. In some ways, this can be thought of as a contract between the parties – as long as the consumer follows the standard defined in the XML schema the service provider will be able to respond to their requests in a way they can understand. The fact that there are XML Schema based standards for every conceivable business sector shows both the importance and the relevance of this part of the XML standard
  • XML Namespaces – the capability to combine different elements from different XML Schemas into other documents while providing a means of differentiating between them is critical to the success of data interchangeability in complex situations, particularly when building interfaces between different business domains. For example, a “page” means a totally different thing to a DB2 DBA and an author of a book – everything is fine until someone writes a book on DB2 memory management and he refers to both in equal measure. By applying a namespace (giving context) to the use of the term the confusion disappears.
  • XPath – having a query language whcih succinctly allows you to access any part of an XML is a very powerful feature. And when multi-document access using languages such as XQuery and SQL/XML, which rely on XPath as their basis, are added to the mix then the power becomes even greater. Many people are not aware of the full power of XPath, with its many built-in functions and the concept of axes (referring to one component based on its relationship to others), but time spent exploring this is extremely worthwhile.
  • XSL – being able to transform an XML into another format (XML and non-XML) using the XML Stylesheet Language is a part of the XML ecosystem that is underestimated in terms of its usefulness.

It is interesting to see that as JSON has evolved, there has been efforts to (or at least talk of) providing features similar to all of the above XML capabilities. It is also noticeable that most of these efforts are not progressing very rapidly. I believe that this is because the principal use cases for JSON are different from those for XML in most cases, and the additional of XML-like functionality actually detracts from the JSON-specific benefits. Also the main advocates of JSON use are very different from those of XML, with JSON being largely driven by application developer requirements whereas XML has a wider following in the data management and standards communities.

Probably the biggest benefit of JSON is that it has a structure which is directly useable by most common programming languages, built around arrays (simple and associative). So for application-to-application, and particularly for intra-application (e.g. AJAX), data transfers it is ideal. But when it comes to complex interactions, particularly involving multiple parties, the additional features of XML soon become invaluable. Also when it comes to long term storage, XML features are beneficial, particularly in terms of the ability to validate the XML and to perform queries across XML document sets.

So should I be focusing my learning efforts on JSON or on XML? I would suggest that the answer to this is very dependent on your use cases. If you are supporting developers who are producing dynamic web or mobile applications then having a knowledge of JSON and what support DB2 provides in this space will be critical. If your focus is more on B2B applications, or applications where conforming to an industry standard is important them XML is going to be your tool of choice.

It will be interesting to see where the two technologies go in the years ahead. I would expect that once the current JSON hype dies down we will find both technologies being used to their strengths. I would be surprised if JSON ever acquired all the additional features that XML has, because this would introduce overheads that would lessen its usefulness in its principal intra-application use case.

What I think we can expect is much more native support for JSON inside all the DB2 products. If I were to give a priority to the requirements then it would be –

  • Functionality within SQL to both construct and consume JSON. This is because JSON is typically used as the method of choice for AJAX operations, where both the request and response is in JSON format. I believe there is a lot to be gained by pushing more of the processing down into the data layer. I’ve always been a big advocate of the use of stored procedures as “data services” and have had good success with using them in XML-based service environments. Being able to build JSON services in the same way would, I believe, make integrating AJAX toolkits with DB2 much easier. The focus of DB2 development in this space at the moment seems to be largely on JSON consumption, at the expense of JSON construction. I believe that both are needed to make this a viable pattern
  • Closer integration between middleware and DB2 to provide the speedy response that microservices need. We are already seeing this integration happening on both platforms, with facilities becoming available to provide microservices on top of both DB2 tables and stored procedures appearing in both DB2 for z/OS and DB2 for LUW. In my opinion the ability to expose stored procedures as microservices will become more important than tables, since there are only a small number of use cases that can be satisfied by single table access (although this can be extended in some types of service by the use of views).
  • Support for a native JSON data type within procedural SQL languages. In light of the fact that I believe there will be a need to expose stored procedures as microservices, having direct support for JSON structures, via their component arrays, in procedural languages is going to be necessary for performance. Note that because the underlying structure of JSON is based on arrays, the support for the various array types in SQL is really a prerequisite to this. In other words, it isn’t simply a different form of the parsed and modified DOM tree that is used by pureXML.
  • Support for a native JSON data type in DB2 tables. You’ll notice that I’ve separated this out from the support in procedural languages. This is deliberate since I believe that the need to store JSON when it is being used as an intra-application data interchange method is not as common as when XML is being used in a B2B setting. However, over time I expect there will be more occasions where JSON storage will be required, as regulation continues to tighten the requirement for audit. If some of the features of XML which make it more appropriate for B2B solutions were to find their way into JSON then this also would be more likely to require JSON storage. It is also becoming common for JSON to be used as a storage format for Big Data solutions, and again this would be a situation where storage of JSON in a native format in DB2 would make sense.

I would hope that the focus on adding JSON support to DB2 will not be at the expense of continuing to enhance the pureXML support. Admittedly JSON support needs the most work, and I am pleased to see IBM taking this seriously. But there are still some aspects of the XML standard which are not supported by pureXML, and also a need to standardize the offering across the DB2 family.

Hopefully this article will encourage learning both pureXML and JSON, and that thought will be given as to what is the best solution for any particular use case, rather than be caught up in the latest hype for one particular technology.

1 Like
Recent Stories
DB2 12 In-Memory Feature: Fast Index Traversal

Looking at the DB2 12 Enhanced Merge testing

On the waves of DB2 12 for z/OS