April 2015 “Fun with SQL” third Challenge

April 2015 “Fun with SQL” third Challenge

This month on the Content Committee blog we’ll be doing an online version of the “Fun with SQL” dojo. Each week we’ll present an SQL challenge and request the community to solve it. People can respond to the challenge in the comments section of the blog post, or they can email Kurt directly at kurt.struyf@lone-star.be with a solution or idea and he will post it to the blog. You can post a possible answer, ask a question, and comment on something someone else has posted, or offer alternatives to something that’s been posted. Don’t be shy about posting your answer if someone else has already posted. At the end of the month, the content committee will take the names of the people that have participated and randomly select one to receive a reward of an IPad case, courtesy of IBM! So, don’t be shy, just dive in and share your experience or inexperience and get a free and fun education!

This week we will focus on XML and SQL. Here is the third SQL challenge, it's twofold and  based upon the DB2 sample database.

  1. Retrieve the information from the XML column “description”  from the product table
  2. Provide a list of all product names (but just the names) out of the product description where part of the name is “Scraper”

 have fun

Kurt Struyf

9 Comments

Solution

April 7, 2015 03:47 PM by Joe Carroll

SELECT XMLSERIALIZE(DESCRIPTION AS CLOB EXCLUDING XMLDECLARATION)

FROM PRODUCT

WITH UR;

 

SELECT XMLQUERY('declare default element namespace "http://posample.org";

                                   $A/product/description/name/text()' PASSING DESCRIPTION AS A)

FROM PRODUCT

WHERE XMLEXISTS('declare default element namespace "http://posample.org";

                                   $A/product/description[fn:contains(name, "Scraper")]' PASSING DESCRIPTION AS A)

;

 

Queris need to handle a sequence of descriptions

April 8, 2015 07:36 PM by Robert Mala

To understand this puzzle you need to look at the xml schema of the documents.   You can use this statement to return xml schema:

 

select xmlparse( document component )

  from syscat.xsrobjectcomponents

 where objectschema = 'DB2INST1'

   and objectname = 'PRODUCT'

  with ur

;

 

Turns out that the xml schema allows a document to have any number of descriptions:

 

<xs:element name="description" minOccurs="0" maxOccurs="unbounded">

 

To prove this use you can use this statement to insert a document with 4 descriptions:

 

insert into db2inst1.product (

  pid

, description

)

values (

  'XXX-XXX-XX'

, xmlvalidate(

    xmlparse( document

      '<product pid="XXX-XXX-XX">

         <description>

           <name>Ice Scraper, Windshield 4 inch</name>

           <details>Ice Scraper 4 inches wide</details>

         </description>

         <description>

           <name>Basic Ice Scraper, Windshield 4 inch</name>

           <details>Basic Ice Scraper 4 inches wide, foam handle</details>

           <price>3.99</price>

         </description>

         <description>

           <name>Delux Ice Scraper, Windshield 5 inch</name>

           <details>Delux Ice Scraper 5 inches wide, neoprene handle</details>

           <price>7.99</price>

         </description>

         <description>

           <name>Super Delux Ice Scraper, Windshield 6 inch</name>

           <details>Super Delux Ice Scraper 6 inches wide, leather handle</details>

           <price>15.99</price>

         </description>

       </product>'

    )

    according to xmlschema id db2inst1.product

  )

)

;

 

You could argue the xml schema is wrong and should be changed so only one description is allowed.   But this is supposed to be a puzzle and that would be no fun!

 

This means queries need to handle a sequence of descriptions.

More fun with sequences...

April 9, 2015 02:48 AM by Matthias Nicola

Hmm.... interesting. So, for the sake of extending the puzzle let's assume that there could be documents that contain multiple descriptions, as Robert suggested. Would the query that Joe has posted still work?

Yes, it would still run but it might no longer return the desired result. For example, assume a document with multiple descriptions and only one of those is a "scraper". I suspect the query would return *all* the names from that document, not just the one scraper.

The reason for this behavior is that the XMLEXISTS predicate in the SQL WHERE clause really only filters at the row (document) level, but not within any given document.

So, to handle multiple descriptions correctly, the query would need to be adjusted. One idea is to add an additional predicate to the query, another idea is to write the SELECT statement without XMLEQUERY and without XMLEXISTS....

 

Modifield Solution for documents with multiple descriptions

April 9, 2015 07:27 AM by Joe Carroll

Excellant points.  Let's say I add another description to Robert's document so that I have a "non Scraper" product.

<product xmlns="http://posample.org" pid="XXX-XXX-XX">
 <description>
  <name>Ice Scraper, Windshield 4 inch</name>
  <details>Ice Scraper 4 inches wide</details>
 </description>
 <description>
  <name>Basic Ice Scraper, Windshield 4 inch</name>
  <details>Basic Ice Scraper 4 inches wide, foam handle</details>
  <price>3.99</price>
 </description>
 <description>
  <name>Delux Ice Scraper, Windshield 5 inch</name>
  <details>Delux Ice Scraper 5 inches wide, neoprene handle</details>
  <price>7.99</price></description>
 <description>
  <name>Super Delux Ice Scraper, Windshield 6 inch</name>
  <details>Super Delux Ice Scraper 6 inches wide, leather handle</details>
  <price>15.99</price>
 </description>
 <description>
  <name>Snow Blower</name>
  <details>Big boy snow blower</details>
  <price>415.99</price>
 </description>
</product>

I could modify the query to have the contains() function in both the XMLQUERY and the XMLEXISTS.  This would only qualify rows with XML that met the condition and only return descriptions that contained the word scraper, but now the desriptions from any multi-description document are on a single line.


 SELECT XMLQUERY('declare default element namespace "http://posample.org"; 
                 $A/product/description/name[fn:contains(., "Scraper")]/text()'
                 PASSING DESCRIPTION AS A)
  FROM DSN8B10.PRODUCT
 WHERE XMLEXISTS('declare default element namespace "http://posample.org"; 
                 $A/product/description[fn:contains(name, "Scraper")]'
                 PASSING DESCRIPTION AS A)
;

To return each description on a separate line, let's swap out XMLQUERY for XMLTABLE.


 SELECT X.PROD_DESC
   FROM DSN8B10.PRODUCT
       ,XMLTABLE(XMLNAMESPACES(DEFAULT 'http://posample.org')
                 , '$A/product/description/name[fn:contains(., "Scraper")]' 
                 PASSING DESCRIPTION AS A
           COLUMNS
              PROD_DESC            VARCHAR(100) PATH '.'
                ) AS X
  WHERE XMLEXISTS('declare default element namespace "http://posample.org"; 
                 $A/product/description[fn:contains(name, "Scraper")]'
   PASSING DESCRIPTION AS A)
;

 

Overly complicated, but fun solution

April 9, 2015 09:07 AM by Joe Carroll

Of course, it is Spring, so how about some FLOWR's.  We could build a new master document based on all of the documents that have Scraper in the name and then loop through the names looking for the ones that contain the word Scraper.  Not the best, but slightly more fun.

SELECT X.PROD_DESC
  FROM XMLTABLE(XMLNAMESPACES(DEFAULT 'http://posample.org')
                 , 'for $desc in $A/product/description/name
                    where contains($desc, "Scraper")
                    return <result>{data($desc/text())}</result>' 
  
        PASSING (SELECT XMLELEMENT(NAME "doc", XMLNAMESPACES(default 'http://posample.org')
                         ,XMLAGG(DESCRIPTION))
                  FROM DSN8B10.PRODUCT
                 WHERE XMLEXISTS('declare default element namespace "http://posample.org"; 
                                  $A/product/description[fn:contains(name, "Scraper")]'
                                  PASSING DESCRIPTION AS A
                                )                  
                 ) AS A
        COLUMNS PROD_DESC       VARCHAR(100)        PATH '.'
               ) AS X
;

 

Possible Solutions

April 9, 2015 10:08 AM by Michael Belt

SELECT

  DESCRIPTION

FROM DSN8100.PRODUCT;

 

SELECT

  XMLCAST(

    XMLQUERY(

       'declare namespace x = "http://posample.org"; 

      /x:product/x:description/x:name' PASSING DESCRIPTION)

        AS VARCHAR (256)) PRODUCT_NM

FROM DSN8100.PRODUCT 

WHERE

  XMLEXISTS(

    'declare namespace x = "http://posample.org"; 

    /x:product/x:description[fn:contains(x:name, "Scraper")]' PASSING DESCRIPTION);

Alternate

April 9, 2015 11:28 AM by Michael Belt

Here's another option.  I don't think it would perform well, but it was fun to write & it solves both at once (it get's the description info [in the WITH statement] & then selects name filtered by "Scraper"s).

 

WITH SHRED_PRODUCT_DESC
    AS (SELECT
            PRODUCT_DESC.NAME_TX,
            PRODUCT_DESC.DETAILS_TX,
            PRODUCT_DESC.PRICE_AM,
            PRODUCT_DESC.WEIGHT_TX
        FROM DSN8100.PRODUCT,
        XMLTABLE(
            XMLNAMESPACES(DEFAULT 'http://posample.org'), 
            '/product/description' PASSING DESCRIPTION COLUMNS
                NAME_TX VARCHAR(256) PATH 'name',
                DETAILS_TX VARCHAR(256) PATH 'details',
                PRICE_AM DECIMAL(10,2) PATH 'price',
                WEIGHT_TX VARCHAR(64) PATH 'weight'
            ) AS PRODUCT_DESC)

SELECT NAME_TX FROM SHRED_PRODUCT_DESC WHERE NAME_TX LIKE '%Scraper%';

Some other thoughts...

April 9, 2015 08:48 PM by Robert Mala

I’m really enjoying the discussion and offer some other thoughts.   I have a general principle of maintaining the data type unless there’s good reason not to.   The puzzle doesn’t give any indications as to what form the result should take or how it will be used.   So I’ll interpret the puzzle in the broader context and offer an xquery only solution.

 

xquery

  fn:distinct-values(

    for $name in db2-fn:xmlcolumn('DB2INST1.PRODUCT.DESCRIPTION')/product/description/name

    where fn:matches($name,"(^|[^\w]+)scraper([^\w]+|$)","i")

    return $name

  )

;

 

There are a few of reasons why I like this approach.   There’s no switching from an xml to relational data model or data types.   It’s the right query language for the job at hand having features specifically for xml and functions not currently available in SQL.   Above I remove duplicates and use the fn:matches to find whole words regardless of case.   Also, xquery can be embedded in applications just as easily as SQL.

 

Keep in mind there can be issues when casting between data types if the xml schema doesn’t explicitly constrain the content.   I’d cast xs:string to CLOB to be reasonably confident of not getting an error.   Otherwise arbitrary limits and indicators are needed.

 

select distinct

       x.name30

     , x.ind

  from product p

     , xmltable( '$DESCRIPTION/product/description[fn:matches(name,"(^|[^\w]+)scraper([^\w]+|$)","i")]' columns

         name30 varchar(30) path 'fn:substring(name,1,30)'

       , ind    integer     path 'fn:boolean(name[fn:string-length(.) gt 30])'

       ) x

;

 

Re: using xquery

April 22, 2015 03:31 PM by Peter Vanroose

Robert,

Strictly speaking, you're not returning only the "Scraper"s, as asked, but instead all the "scraper" variants in a case insensitive way (since you added the "i" option to fn:matches).

By default regular expressions (and XML) are case sensitive, as are the LIKE and the fn:contains() in the other solutions.

Making them also case-insensitive is of course relatively straightforward (e.g. using the "upper" SQL function).


--      Peter Vanroose.

 

Recent Stories
Selecting Appropriate Data Architecture Technologies

SQL Tuning for Developers – Part 2 Explain Output

Some of my favorite tunings tips and tricks for DB2 by Joe Geller