Introduction to XML in DB2 11 for z/OS – Part 5 – XPath basics
In the post today, we are getting back on track how to access and manipulate XML documents in DB2 for z/OS. But this time, with focus on more complex XML data manipulation techniques than just selecting/inserting/updating, or deleting the whole XML document. Before learning the SQL/XML functions, which are used to do such operations with XML documents, it is essential to introduce XPath and XQuery languages. Both are used in many of the SQL/XML functions. Please note, that this series will cover just very basic subset of XPath and XQuery languages that will be useful as a start with SQL/XML functions.
DB2 for z/OS supports XQuery 1.0 and XPath 2.0 language standards. However, note that not all XQuery 1.0 and XPath 2.0 standards are supported in DB2. This series won’t provide a complete list of what is supported and what is not. Refer to pureXML Guide to get a complete information about DB2 XQuery and XPath languages.
Today’s post will focus on introduction of XPath language and some basic XPath expressions. XQuery language and SQL/XML functions will be covered later.
XML Path Language (XPath) is a W3C standard. It is a language for addressing (or selecting) nodes of an XML document. The building block of XPath expression is a location path expression. You best understand the location path expression when you think about an XML document in its tree structure representation. That’s how an XML document is actually stored in the DB2 database. Refer to one of the previous post to get more information about the internal structure of an XML document in DB2 for z/OS.
Consider the following XML document and its hierarchical tree representation. We will use it in the further examples.
<?xml version= "1.0" ?>
<article aid="00001" promoted="YES">
Best Chicken Tikka Masala
<article aid="00002" promoted="NO">
How to Make Besan Ladoo
- Gray – document node
- Dark blue – element
- Orange – attribute
- Light blue – element value
Does the picture above remind you the file system structure? Great, it is not a coincidence. Let’s take a look at the basic location path expressions first.
Basic location path expression
XPath uses a location path to navigate to a single node, or set of nodes in an XML document. Location paths can start, just like in a file system, either at the top of the tree (an absolute path - starting with "/") or at some other place in the tree (a relative path - starting with an element).
The table below lists common location path expressions.
/ selects a root node
. selects a current (context) node
.. selects a parent node
// selects any descendent node
@ selects an attribute
* select all children
or OR condition
| OR condition (unsupported in DB2)
and AND condition
In every step of our path, nodes can be selected by an element name, wildcard, or you can also use the expressions from the following table to select nodes of a specific type. This node selection is called a node test.
text() select text nodes
attribute() select attribute nodes
node() select nodes of any type
comment() select comment nodes
Let’s use the expressions from both tables above and create couple basic selections from the example XML document. The table below lists the location path expressions and their results.
The element in the location path acts as a node test. It gives us all branches we can take when moving down the tree in our path. But it does not give us an option to select a single branch. That’s where XPath predicates come into play. They allow you to filter the set of possible branches. Predicates are enclosed in a square brackets (  ) and specified in the location path itself. The XPath predicate identifier returns either true or false and you can use any relation operators and boolean operators or, and.
The table below shows the location path expressions with predicates and the result of the selections.
Axis and unabbreviated syntax
The XPath syntax we have used so far was an abbreviated syntax. The full syntax is more verbose, but is more descriptive and allows more options. The full, unabbreviated syntax contains in each step of the XPath an explicitly specified axis. The axis, such as “child” or “descendant”, specifies the direction to navigate from the context node. There are obviously more axis specifiers such as following-sibling, ancestor-or-self and so. Some of the above examples would be rewritten to an unabbreviated syntax as shown in the following table.
As said, this post covers just very basics of XPath expressions. Refer to pureXML guide, or to https://en.wikipedia.org/wiki/XPath, or to www.w3.org (advanced) to get more information about XPath language standards.
XPath is a language for selecting nodes of an XML document. The most important kind of XPath expression is a location path. A location path consists of a sequence of steps which has three components: axis which specifies the direction from the context node, a node test and predicate(s) to filter out the nodes to be selected. Abbreviated XPath expressions are similar to a file system locations or the Uniform Resource Specifier (URI). XPath expression is one of the expressions of XQuery language which will be introduced in the next post.