DB2 11 Application Developer for z/OS - Part 9 - Access to Non-Structured Data

DB2 for z/OS is no longer about the traditional relational structured data only. The evolution brought us not only binary objects, or XML and JSON (sometimes referred to as semi-structured data), but also other non-structured data living in a distributed file system such as HDFS. DB2 provides means for the application developers to access these kind of data, usually with the standard SQL interface extended by user defined functions.

The ninth part of the application developer test deals with the data structures like XML, JSON, and even Hadoop data, from the application developer perspective. There is a lot of material to go through depending on your experience with these types of data.

Ability to work with native XML data

  • IBM added native XML support in DB2 9 with the feature called pureXML. See an overview of pureXML and the underlying data model. DB2 stores XML data in an internal form (not a string) in separate table spaces from the tables that contain the XML columns.
  • Please note that you could use XML even before DB2 9, for example, using XML Extender, which is obsolete now.
  • Make sure you understand the differences between the XML and relational model. You can store well-formed XML documents in their hierarchical form and retrieve all or just portions of those documents.
  • There is also a brief tutorial how to use XML with a few examples that can be run directly from SPUFI. You can find more details how to work with XML data in the next
  • From the application developer perspective, you need to understand how to use the XML data type.
  • For inserting the XML data you can use SQL INSERT statement. Please note that a document node will be created implicitly if one does not already exist (this is true even for UPDATE or XMLDOCUMENT function).
  • If you need to insert many XML documents at once, you may also consider using the LOAD utility (as you can see on that page, DB2 utilities support XML data).
  • For retrieving the XML data you can use
    • pure SQL for retrieving entire XML document,
    • XQuery, which is a language for interpreting, retrieving, and modifying XML data (XQuery expressions are sometimes called FLWOR expressions). You can use XQuery for
    • XPath, which is a subset of XQuery,
    • or a combination of above.
  • There are two ways of updating the XML document -
    1. to update entire documents, you can use SQL UPDATE;
    2. to update only a portion of the document, you can use XQuery with the XMLMODIFY built-in function in the UPDATE statement.
  • There are three basic updating expressions you can use in XMLMODIFY:
    1. Delete expressions
    2. Insert expressions
    3. Replace expressions
  • To delete rows with XML that contains certain criteria, you can use SQL DELETE and restrict the rows that are deleted by using the XMLEXISTS predicate in the WHERE clause. This will delete the entire row from the table, impacting more than the XML content.
  • Keep in mind that if you store and retrieve the XML document, you may see few differences.
  • The following sections show how you can work with the XML data type in your embedded Especially, how you define host variables, how you retrieve and update the XML data. There are three important recommendations on the previous link:
    • Avoid using the XMLPARSE and XMLSERIALIZE functions - Let DB2 do the conversions between the external and internal XML formats implicitly.
    • Use XML host variables for input and output. Doing so allows DB2 to process values as XML data instead of character or binary string data. If the application cannot use XML host variables, it should use binary string host variables to minimize character conversion issues.
    • Avoid character conversion by using UTF-8 host variables for input and output of XML values whenever possible. In an embedded SQL application, if you retrieve the data into a character host variable, DB2 converts the data from the UTF-8 encoding scheme to the application encoding scheme. If you retrieve the data into binary host variable, DB2 does not convert the data to another encoding scheme.
  • This section shows the details specific for Java applications.
  • Similarly, this is a section for ODBC applications.
  • To improve the efficiency of queries on XML documents, XML indexes can be used.
  • What you need to keep in mind is the XML data encoding. There are two types of encoding of XML data:
    1. XML data that is in character or graphic application data types is considered to be externally encoded. XML data is considered to be encoded in the application code page or as specified by CCSID in SQLDA.
    2. XML data that is in a binary application data type (or has a bit data subtype) is considered to be internally encoded. The content of the data determines the encoding.
  • Internal encoding in a binary application data type is determined from three components:
    1. Unicode Byte Order Mark (BOM), which is a defined byte sequence at the beginning of the XML data
    2. XML declaration, which is a special tag at the beginning of the XML data,
    3. Encoding declaration is an optional part of the XML declaration that specifies the encoding for the characters in the document.

   If there is no BOM and no XML declaration, the encoding is UTF-8.

  • Make sure to review the XML encoding considerations and scenarios to understand what conversions of XML data may occur and in what situations.
  • DB2 may add an encoding declaration at the beginning of the XML output data. If the target data type is a CLOB or DBCLOB type, additional code page conversion might occur, which can make the encoding information inaccurate. Keep this in mind if the data is further parsed in the application as data corruption can result.
  • For an evolution of XML in DB2 for z/OS refer to this blog.
  • Also see the new series on the DB2 Beginner's Blog.

Ability to work with JSON data

  • JSON data can be seen as an alternative to XML data and is very popular in mobile applications. You can see the basic concepts here. JSON support in DB2 is included in the DB2 Accessories Suite for z/OS.
  • DB2 stores JSON data as an internal binary format named BSON (Binary JSON). Therefore, to store the JSON data in DB2, you need to create a BLOB column.
  • DB2 allows you to work with JSON documents using SQL. There are the following functions to work with JSON:
    • JSON_VAL provides and SQL interface to extract and retrieve JSON data into SQL data types from the BSON object. In this function, you specify the BSON data, the search string, and the resulting data type. You can use it to retrieve data into a column in SELECT, but also in a WHERE clause.
    • JSON2BSON converts a JSON data into the internal BSON format.
    • BSON2JSON does the opposite conversion, it converts the binary format JSON into a readable JSON data.
  • The basic SQL operations with JSON data cover:
    • inserting the JSON data using JSON2BSON,
    • updating the data, again using the JSON2BSON function,
    • retrieving data that is inside JSON document using JSON_VAL.
  • Check this section for more examples on manipulating JSON data using SQL.
  • DB2 also provides a NoSQL API to access the JSON data, though I would say the SQL API would be the preferred way. Anyway, the NoSQL interface consists of these parts:
    • Java API that allows storing and selecting in Java applications.
    • JSON command line interface, which you can use as an interactive shell or for running scripts. There are many functions you can use with this interface.
    • Wire listener which acts as a mid-tier between the applications and DB2 and supports many different languages.
  • You can read about the JSON support in DB2 for z/OS on Dan Luksetich's blog.

Ability to perform schema Validation for XML

  • XML schema validation is the process of determining whether the structure, content, and data types of an XML document are valid according to an XML schema.
  • There are two ways to validate the XML schema in DB2:
  • XML schema validation removes all ignorable whitespace characters from the document.
  • To determine whether a document has been validated and which schema has been used for the validation, use the function XMLXSROBJECTID, which returns the XSR object identifier of the XML schema that was used to validate the document, or returns 0 if the document have not been validated.

Ability to work with XML Built-In Functions

  • There are many built-in functions that enable work with XML data. There is one table function, one aggregate function, and several scalar functions.
  • The table function - XMLTABLE - is useful for returning a table from the evaluation of a XQuery expression (as we have see before).
  • The XMLAGG aggregate function can be helpful for aggregating the sequence of XML expressions when constructing a XML document.
  • The XSLTRANSFORM function is useful for transforming a XML document into a different format. This is new in DB2 11.
  • We have also already been discussing XMLQUERY function for retrievals of portions of XML data.
  • See the first section for more details about XMLMODIFY function for XML updating
  • There are several built-in functions that can be used for XML document construction:
    • XMLELEMENT returns an XML value that is an XML element node
    • XMLFOREST returns an XML value that is a sequence of XML element nodes.
    • XMLNAMESPACES constructs namespace declarations from the arguments and can be used only for specific functions, such as the XMLELEMENT function and the XMLFOREST
    • XMLATTRIBUTES constructs XML attributes from the arguments. This function can be used as an argument only for the XMLELEMENT
    • XMLDOCUMENT returns an XML value with a single document node and zero or more nodes as its children
    • XMLCOMMENT returns an XML value with a single comment node from a string expression
    • XMLCONCAT returns an XML sequence that contains the concatenation of a variable number of XML input arguments.
    • XMLTEXT returns an XML value with a single text node that contains the value of the argument, this can be helpful for use with XMLAGG.
    • XMLPI returns an XML value with a single processing instruction node.
  • XMLSERIALIZE returns a serialized XML value of the specified data type that is generated from the first argument. XML serialization is the process of converting XML data from its internal representation in a DB2 table to the textual XML format that it has in an application.
  • XMLPARSE parses the argument as an XML document and returns an XML value, it can strip or preserve the white spaces. This can be useful for transforming a textual XML to an XML value.
  • XMLXSROBJECTID returns the XSR object identifier of the XML schema that is used to validate the XML document specified in the argument.
  • XMLCAST function returns the cast operand converted to the type specified by data-type.

Ability to access Hadoop Data

  • DB2 for z/OS is able to access and process the Hadoop data. IBM provides two user defined functions to work with IBM BigInsights:
    • HDFS_READ - reads delimiter-separated files from HDFS on an InfoSphere BigInsights cluster.
    • JAQL_SUBMIT - enables you to invoke an InfoSphere BigInsights ad hoc Jaql query from a DB2 application. This function creates a CSV file from JSON data as input for the HDFS_READ function.
  • You can learn more about these two functions in this
  • There was also a series of articles about these functions on IBM developerWorks.
  • More details about DB2 and Hadoop integration can be found in this redbook in Chapter 7.3 - Integration with Hadoop-based IBM BigInsights big data platform.
  • DB2 can also use IBM BigSQL or similar technologies (if they are DRDA compliant) to access the data in Hadoop.
1 Comment
1 Like

json_van throwing error

February 14, 2017 03:17 AM by Dhinesh Palanisamy

while i'm trying to retrieve the particular field in column , facing sql error -440. is this built-in  function comes with DB2? or we need to create 

Recent Stories
New V12 functionality - UNLOAD authority! Why? What is the point?

Partition By Growth Table Spaces - Partition 2, Living with PBGs

Partition By Growth Table Spaces - Partition 1, Getting Started