SQL XML BASICS:
Ever get confused on XML terminology? This article lays out the basics of processing and querying of XML stored documents in Db2 and helps to better understand the differences in terms.
Db2 pureXML: pureXML is the native XML storage feature in IBM Db2. It not only stores and manages your XML database in its native hierarchical format, but provide querying capabilities, storage, and indexing technologies. Db2 stores traditional Db2 data types (CHAR, SMALLINT, DATE, ETC.) and has a datatype specific for assigning to XML documents (XML is the datatype). By assigning this datatype to a column name, Db2 stores the contents of a document in a parsed format leaving intact the hierarchical structure of the XML document. pureXML not only allows for the storing of XML documents, but also the ability to retrieve all or only a portion of a document.
DDL Example:
CREATE TABLE EMP
(EMPNO CHAR(6) NOT NULL,
LASTNAME VACHAR(25) NOT NULL,
FIRSTNME VARCHAR(25) NOT NULL,
MIDINIT CHAR(1) NOT NULL,
…..
….
EMP_PATIENT_XMLDOC XML NOT NULL Note XML data type
)
What is XML? It is a simple approach of data based on structure. It marks sections of the data document with descriptive tags (also called markups), and not limited to a fixed set of them (extensible). XML is independent of software. XML can move through software upgrades and to different software products without the fear that it will become incompatible with its new environment. It is:
1) Flexible
2) Easy to share
3) Self-describing
4) Neutral for exchanging data among diverse devices
- The universal standard for data interchange
- Independent of the platform or vendor
5) Growth of XML data
- XML-based industry and data standards
- XML data becoming more critical to enterprise operations
XML Document Example: Note the NODES of data elements (<name>, <addr>, etc.). Each node of the tree is an XML element and is written with an opening and closing tag

What is an XML Document?
1) eXtensible Markup Language: XML is a language designed to describe, structure, and
transport electronic data documents
2) A hierarchical data model
3) Self describing
4) Well-formed: A document that conforms to the XML syntax rules, like:
- It must begin with the XML declaration
- It must have one unique root element
- Start-tags must have matching end-tags
- Elements are case sensitive
- All elements must be closed
- All elements must be properly nested
- All attribute values must be quoted
- Entities must be used for special characters
With XML documents, you always want to make sure that all documents within a table are consistent and follow the same patterns. This can be done by validating the document based on a specific SCHEMA definition of the document. Db2 supports XML document validation with XML schemas:
- XML schemas used for validation are registered with Db2
- Optionally, supplied IBM-provided User Defined Function
can be used to validate XML docs during insert or DSN_XMLVALIDATE
What is XQUERY? XQuery is a language for processing XML documents as a whole or parts of a document. XQuery code provides the means to store, extract and manipulate XML documents. In order to query against an XML document, one must know to contents, tags, order of nodes , etc…
- XQuery is a generalized language for interpreting, retrieving, and modifying XML data.
- XQuery, SQL, or a combination of both can be used to query and update XML data.
- You can use SQL to update entire documents from XML columns, just as you retrieve data from any other type of column. You can also update entire or portions of documents with XQuery functions. XMLQUERY, XMLSERIALIZE, XMLEXISTS, XMLCAST are a few of the most common function.
- XQuery is to XML as SQL is to a RDBMS.
What is XPath? XPath (XML Path Language) is a XML path language that is used to select nodes from an XML document using queries. XQuery contains a set of XPath expressions to query/process specific parts (nodes) of an XML document. An XPath expression is referred to simply as an Xpath. They are used to isolate parts (nodes) of XML documents.
- Similar to file path notations in file systems
- Based on keywords, symbols, and operands
- Db2 XPath can be used in the XMLQUERY SQL built-in function to extract specific parts of an XML document or in the XMLEXISTS SQL predicate
to extract data from specific XML documents
How do you query an XML Document? You can query a XML column using SQL, but you then get the whole document. Program code would then have to process through the document to get what it wanted specifically out of the document. For Example:
SELECT EMPNO, LASTNAME, EMP_PATIENT_XMLDOC
INTO ……
FROM EMP
SQL does allow for embedding XQuery or XPath expressions/functions within an SQL statement. Used especially for query logic specific to nodes of data within a document. The different expressions are broke out into two areas:
Publishing: XMLELEMENT, XMLATTRIBUTE, XMLFOREST, XMLAGG
Querying: XMLQUERY, XMLTABLE, XMLEXISTS
You query through a document by using some of the common XQUERY functions and XPATH navigation. XPATH also comes with its own set of functions. fn:sum, fn_avg, and fn_upper are a few shown of the XPATH functions highlighted.
- XMLQUERY and XMLEXISTS. Passing clause states XML document column name.
SELECT EMPNO,
XMLQUERY('/patient/email' passing EMP_PATIENT_XMLDOC) as EMAIL
FROM EMP
WHERE XMLEXISTS ('/patient[email]' passing EMP_PATIENT_XMLDOC)
;
Returns only the email address with all the documents that have an email node.
EMPNO EMAIL
---------- ----------------------------------------------------------------------------------------------------------
123456 <?xml version="1.0" encoding="IBM037"?><email>jdoe@mymail.com</email>
Notice the xml version and encoding information that came along with the email address. This is called the XML declaration. XML declarations contain details that prepare an XML processor to parse the XML document. In order to not have this show up as queries go against a document, the XMLSERIALIZE by default will not show the declaration information. XMLSERIALIZE is a SQL/XML operator that you can use to convert an XML type to a character type.
SELECT EMPNO,
XMLSERIALIZE( XMLQUERY('/patient/email' passing EMP_PATIENT_XMLDOC)
AS CLOB(1K)
) AS EMAIL
FROM EMP
WHERE XMLEXISTS ('/patient[email]' passing EMP_PATIENT_XMLDOC)
Returns the email address without the declaration information.
EMPNO EMAIL
---------- ----------------------------------------------------------------------------------------------------------
123456 <email>jdoe@mymail.com</email>
- XMLQUERY/XPATH Sum. With no WHERE logic, this query will total up across all documents in the table the values in the cost node.
SELECT XMLQUERY ('fn:sum(/patient/service/cost)'
PASSING EMP_PATIENT_XMLDOC)
)
FROM EMP;
- XMLQUERY /XPATH Avg. With no WHERE logic will total up and then average all cost.
SELECT XMLCAST( XMLQUERY ('fn:avg(/patient/service/cost)'
PASSING EMP_PATIENT_XMLDOC) as decimal(12,2)
)
FROM PATIENT
;
- XMLQUERY/XPATH Upper. Brings back the upper case of names in each document.
SELECT ( XMLQUERY('/patient/fn:upper-case(name)'
PASSING EMP_PATIENT_XMLDOC
)
FROM EMP;
;