Can You Replace an Entire Application Program with a Single OLAP Query?

You bet you can! If you are using any currently supported version of DB2 for LUW, or are using DB2 10 for z/OS (or DB2 9 for partial support), then you can take advantage of some extremely powerful SQL constructs called OLAP expressions. These expressions, sometimes also referred to as window functions, are typically utilized for report queries and can generate some extremely powerful reports. Let’s take a look at some of these expressions to get an example of what they can do.

OLAP specification is best taught by example. Let’s start with a simple process and report and add to it as we go along. OLAP specification allows for numbering of the result set. This numbering can be according to a specified order, or not. It can also be applied to something called a “partition” or “window” of the result table. The entire result set can be a window, and that’s what is happening in this example.

SELECT EMP.LASTNAME, EMP.SALARY,

      ROW_NUMBER() OVER()

FROM   EMP EMP

ORDER  BY EMP.LASTNAME

Here we are selecting data from the employee table, returning the lastname and salary of our employees. We’ve specified that the result will be ordered by the lastname column. We’ve also specified the ROW_NUMBER() window function in the final SELECT of the statement. The ROW_NUMBER() function tells DB2 that the output row is to be numbered according to the ordering applied to the function, starting with the number 1 and continuing by adding 1 to the number for each additional row returned. If no ORDER BY is specified in the window then the numbering is arbitrary with respect to the order of the result table. Here specifically we said “ROW_NUMBER() OVER()”. We have specified no window and no ordering, so the rows are numbered arbitrarily in the result set. The ORDER BY clause of the final SELECT (the only SELECT in this example) has no meaning for the numbering, so don’t be fooled by a coincidental numbering in the order of the result.

LASTNAME        SALARY      3                  

--------------- ----------- -------------------

ADAMSON            55280.00                    1

ALONZO              31840.00                    2

BROWN              57740.00                    3

GEYER              80175.00                    4

GOUNOT             43840.00                    5

HAAS              152750.00                  6

HEMMINGER          46500.00                    7

HENDERSON          89750.00                    8

JEFFERSON          42180.00                    9

. . . . .

 

In this next example, we’ve added a little more meaning to the numbering of our output by actually specifying an order by which the numbering is applied.

SELECT EMP.LASTNAME, EMP.SALARY,

ROW_NUMBER() OVER(ORDER BY SALARY DESC)

FROM   EMP EMP

ORDER  BY EMP.LASTNAME

 

So, in this example we have specified “ROW_NUMBER() OVER(ORDER BY SALARY DESC)”. There is no window specified and so the numbering is over the entire result set. However, we have specified the order in which the rows are to be numbered in the result set. So, the rows are numbered in the entire result set in the order of the SALARY column by descending value.

LASTNAME        SALARY      3                  

--------------- ----------- -------------

ADAMSON            55280.00                   18

ALONZO             31840.00                   42

BROWN              57740.00                   17

GEYER              80175.00                    7

GOUNOT             43840.00                   30

HAAS              152750.00                  1

HEMMINGER           46500.00                   26

HENDERSON          89750.00                    5

JEFFERSON          42180.00                   31

JOHN               69840.00                   10

JOHNSON            47250.00                   25

JONES              68270.00                   13

KWAN               98250.00                    2

LEE                45370.00                   28

. . . .

 

Each row returned gets a number one greater than the previous row. Also notice that the ORDER BY clause of the final result table is dictating an order by LASTNAME. So the numbering is in the different sequence (SALARY DESC) than the result set (LASTNAME ASC). It’s already becoming clear that we can create some outstanding reports simply from SQL. Cool!

In this next example, partitioning, also called windowing, has been introduced. In the specification of what the numbering will be over is “OVER(PARTITION BY EMP.WORKDEPT ORDER BY EMP.SALARY DESC)”. This tells DB2 that the result table is to be divided up by the values of the WORKDEPT column and within each of those “windows” the numbering of the rows will be based upon the SALARY column in descending sequence. So, the numbering is no longer over the entire result set, but instead it is established afresh inside each partition or window.

SELECT EMP.WORKDEPT, EMP.LASTNAME, EMP.SALARY,

       ROW_NUMBER() OVER(PARTITION BY EMP.WORKDEPT

      ORDER BY EMP.SALARY DESC)

FROM   EMP EMP

ORDER  BY EMP.WORKDEPT, EMP.SALARY DESC

 

The result table is also ordered by the same two columns in the same sequence as specified by the ORDER BY clause of the final SELECT (the only SELECT in this case). So, the numbering of the rows appears consistent with the ordering of the output.

WORKDEPT LASTNAME        SALARY      4                  

-------- --------------- ----------- --------------------

A00      HAAS              152750.00                  1

A00      LUCCHESSI          66500.00                    2

A00      O'CONNELL          49250.00                    3

A00      HEMMINGER           46500.00                    4

A00      ORLANDO            39250.00                    5

B01      THOMPSON           94250.00                    1

C01      KWAN               98250.00                    1

C01      QUINTANA           73800.00                    2

C01      NICHOLLS           68420.00                    3

C01      NATZ               68420.00                    4

D11      STERN              72250.00                    1

D11      JOHN               69840.00                    2

D11      JONES              68270.00                    3

D11      YAMAMOTO           64680.00                    4

D11      PIANKA             62250.00                    5

D11      BROWN              57740.00                    6

D11      ADAMSON            55280.00                    7

D11      SCOUTTEN           51340.00                    8

. . .

The numbering of the output is simply that. There is no respect to the data in the result table and the next number is simply one more than the previous row within the window. So, even though Nicholls and Natz have the same salary, they do not receive the row number. This leads us into ranking. Ranking differs from numbering in that if two or more rows within the window are not distinct, they will receive the same rank. So, while numbering is based upon the number of rows that precede the current row, ranking is based upon the number of rows that strictly precede the current row. That is, a rank represents the number of rows that precede the current row based upon the values as defined in the ordering within the window. Thus, if two or more rows have the same set of values (are not distinct from each other), there will be gaps in the ranking.

SELECT EMP.WORKDEPT, EMP.LASTNAME, EMP.SALARY,

       RANK() OVER(PARTITION BY EMP.WORKDEPT

                ORDER BY EMP.SALARY DESC)

FROM   EMP EMP

ORDER  BY EMP.WORKDEPT, EMP.SALARY DESC

 

In our example here, within the window represented by department number C01, Nicholls and Natz have the same salary. Thus, the number of rows that strictly precede them is 2 (Kwan and Quintana), and thus, they both get the rank of 3. If there was another single person in the department with a lower salary than Nicholls and Natz, their rank would be 5 as the tie between Nicholls and Natz created a gap.

WORKDEPT LASTNAME        SALARY      4                  

-------- --------------- ----------- -------------------

A00      HAAS              152750.00                1

A00      LUCCHESSI    66500.00                    2

A00      O'CONNELL          49250.00                    3

A00      HEMMINGER          46500.00                    4

A00      ORLANDO            39250.00                    5

B01      THOMPSON           94250.00                    1

C01      KWAN               98250.00                    1

C01      QUINTANA           73800.00                    2

C01      NICHOLLS           68420.00                    3

C01      NATZ               68420.00                    3

D11      STERN              72250.00                    1

D11      JOHN               69840.00                    2

D11      JONES              68270.00                    3

D11      YAMAMOTO           64680.00                    4

D11      PIANKA             62250.00                    5

D11      BROWN              57740.00                    6

D11      ADAMSON            55280.00                    7

D11      SCOUTTEN           51340.00                    8

. . .

 

DENSE_RANK() works much like RANK(), except that it closes the gaps that RANK() would otherwise create. In this example, we did not specify a window and so the entire result set is the window. The first query ranks the people working for the company by salary descending. As you can see in the result, Nicholls and Natz are tied with the same salary within the window with a rank of 11, so the next rank assigned to Jones is 13 due to the gap caused by the tie.

SELECT EMP.LASTNAME, EMP.SALARY,

RANK() OVER(ORDER BY EMP.SALARY DESC)

FROM   EMP EMP

ORDER  BY EMP.SALARY DESC

 

LASTNAME        SALARY      3                  

--------------- ----------- --------------------

. . .

STERN              72250.00                    9

JOHN               69840.00                   10

NICHOLLS           68420.00                   11

NATZ               68420.00                   11

JONES              68270.00                   13

LUCCHESSI          66500.00                   14

. . .

 

The DENSE_RANK() window function will close these gaps, so in the second example, Jones receives a rank of 12.

SELECT EMP.LASTNAME, EMP.SALARY,

DENSE_RANK() OVER(ORDER BY EMP.SALARY DESC)

FROM   EMP EMP

ORDER  BY EMP.SALARY DESC

 

LASTNAME        SALARY      3                  

--------------- ----------- --------------------

. . .

STERN              72250.00                    9

JOHN               69840.00                   10

NICHOLLS           68420.00                   11

NATZ               68420.00                   11

JONES              68270.00                   12

LUCCHESSI           66500.00                   13

. . .

OLAP aggregation is new with DB2 10 for z/OS, but has been available on DB2 for LUW for some time. Aggregation can be applied over a window (average, max, min, count, and more), and GROUP clauses can group a given set of rows within a window. The groups can be specified over a number of rows or over a range of values. This really opens the door for the ability to create complex reports with varying degree of analytics incorporated into the SQL statement! This next query example produces a list of sales from our company’s sales table. Along with the list of every sale are some important key metrics that will help our marketing team focus attention. The first is the average sale by month. That is, for each month in which my sales data spans show me the average sale for that month. So, along with the list of sales is the average sale amount for the month. The second is a running total of sales over the entire result set of data. So, in one query we have detailed information along with important metrics regarding that detailed information. Sweet! This can inspire us to create some extraordinary reports using SQL.

SELECT REGION, SALES_DATE, SALES,

 CAST(AVG(CAST(SALES AS DEC(3,1)))

      OVER ( PARTITION BY YEAR(SALES_DATE),

                          MONTH(SALES_DATE)) AS DEC(3,1))

       AS "AVG SALE FOR MONTH",

 SUM(SALES) OVER (ORDER BY SALES_DATE

                  ROWS BETWEEN

                  UNBOUNDED PRECEDING AND CURRENT ROW)

       AS "RUNNING TOTAL SALES"

 FROM   SALES SALES ORDER BY SALES_DATE

Let’s break down our OLAP expressions: The first produces an average, “CAST(AVG(CAST(SALES AS DEC(3,1))) OVER ( PARTITION BY YEAR(SALES_DATE),  MONTH(SALES_DATE)) AS DEC(3,1))”. The casting is because SALES is actually an integer and we need a decimal result. “OVER(PARTITION BY YEAR(SALES_DATE),  MONTH(SALES_DATE))” tells DB2 that the average will be calculated within a window that is based on the year and month of the sales date. So, calculate the average sale for each month! No ordering is needed within the window because it’s a single value being calculated for each window. The second produces a running total, “SUM(SALES) OVER (ORDER BY SALES_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)”. There is no partitioning clause and so the window is the entire result table. The SUM function indicates a calculated total over the entire window ordered by the SALES_DATE column. The window-order-clause is followed by a window-aggregation-group-clause “ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW”, which tells DB2 that the group of values to sum will be based upon all preceding rows up to the current row. The corresponding ORDER BY in the final SELECT makes sure that the running total makes sense.

 

REGION          SALES_DATE SALES       AVG SALE FOR MONTH RUNNING TOTAL SALES

--------------- ---------- ----------- ------------------ -------------------

Ontario-North   03/29/1996           2                2.0                   2

Ontario-South   12/31/2005           1                1.6                   3

Ontario-South   12/31/2005           3                1.6                   6

Quebec          12/31/2005           1                1.6                   7

Manitoba        12/31/2005           2                1.6                    9

Quebec          12/31/2005           1                1.6                  10

Ontario-South   03/29/2006           3                4.0                  13

Quebec          03/29/2006           1                4.0                  14

Ontario-South   03/29/2006           2                4.0                  16

Quebec          03/29/2006           3                4.0                  19

Manitoba        03/29/2006           5                4.0                  24

Ontario-South   03/29/2006           3                4.0                  27

Quebec          03/29/2006           1                4.0                  28

Manitoba        03/29/2006           7                4.0                  35

. . .

I’ve seen very large programs that produce reports. These programs will issue the same simple or complex SQL statements over and over again. Many times users or managers will complain about the length of time it takes to produce a report, or that the cost of running the report is prohibitive. I truly believe that these OLAP expressions are a very effective tool and an answer to the problem of building and running these large and expensive report programs and can be utilized to reduce development time and lower the operation costs associated with running reports. The intricacies and subtleties of OLAP expressions can be a bit difficult to grasp at first, but with some practice, these things can become more apparent. Before you know it, you’ll be a report-writing genius. Get started today by cutting and pasting these queries and runnning them against your favorite DB2 sample database! Cheers!

 

Daniel L Luksetich

DanL Database Consulting

danl@db2expert.com

www.db2expert.com

 

 

1 Comment
1 Like

The Beauty of SQL

August 24, 2012 02:52 AM by Maja Markotic

Very cool :-)

Recent Stories
An Introduction to IBM Data Studio

The Basics of SQL Performance and Performance Tuning

Index Decluttering Opportunities in DB2 for Linux, UNIX, and Windows