I use DENSE_RANK a lot with both PARTITION BY and ORDER BY
PARTITION BY is like a GROUP BY in some ways but does not
summarize the rows. Every change in the PARTITION BY Column values
set, causes the DENSE_RANK row numbers to restart at 1 again. So we
are numbering within each the set of rows with a common value set
for the PARTITION BY columns.
Within that Group, the ORDER BY specifies what sequence to sort
the individual rows (duplicate on the PARTITION BY key). Dense Rank
then attaches sequential numbers starting at 1 and incrementing by
1 (with no gaps), such that if multiple sorted rows within
each group value have the same ORDER BY values (are
duplicates on ORDER BY key), then they get the same DENSE_RANK
number. ROW_NUMBER function would have given each sorted row a
different number (within each group PARTITION BY key
value) regardless of having the same value of the ORDER BY
I think RANK will be used a lot less, as it also gives duplicate
rows the same rank, but then skips some potential generated
sequential numbers to compensate for earlier duplicates in the
ORDER BY. i.e. could have sequence numbers: 1, 1, 1, 4, 4, 6, 7, 8,
8, 10, etc.
Other Aggregating functions are available, e.g. to calculate Max
or Sum of a group but retaining the detail rows. Unfortunately, can
do COUNT but not COUNT(DISTINCT col) on z/OS. Small weakness.
Would have caused too much complexity of the sort requirements. Can
take MIN and MAX and later compare them to determined if a value
changed. OLAP avoids need for self join when we want the detail and
the Summary as well. Very handy alternative to Grouping Sets (GROUP
BY with ROLLUP), because we don't have to code every single column
in a GROUP BY to retain the detail rows.
These OLAP functions/specifications are applied to the result
set of any GROUP BY clause in the SQL subquery, so that means an
Aggregate function can appear in the PARTITION BY and ORDER
BY clauses, but we cannot use Aggregate function on the result of
the OLAP spec directly. To do that need another subquery.
On other hand, a SELECT DISTINCT eliminates duplicates after the
OLAP functions have already been applied to the result rows. If use
a ROW_NUMBER function, should not get any duplicates unless
PARTITION BY columns are not selected in the result set.
Hope I have covered it in great detail. However there is a lot
more functionality by specifying limited moving windows that I have
omitted this time, for fear of getting too complex, and being a
OLAP specs are extremely powerful to do things with less SQL
than we used to have to write, and originally we had to do horrible
things to count the number of rows preceding each result row. In
the old days, before recursion was available, some methods for this
performed quite badly. Having the RDS Sort do it for us is massive.
Means recursive SQLs merely to count rows, are not needed.
I use OLAP functions to assign seq numbers to aspects of an
Access Path set of rows in the PLAN_TABLE, and can use them to
detect access path changes from one REBIND to the next, by sorting
on access path columns, with a much simpler SQL than we had to code
with outer Joins or NOT EXISTS subqueries in the past. Not only,
that but the performance is vastly better when perhaps a PLAN_TABLE
did not have the best set of indexes to support NOT EXISTS subquery
probes. If one wants a very smart clean up of PLAN_TABLE where
delete multiple old successive REBINDs that resulted in same access
paths, but to keep history of access path change for say one year
(without all the duplicates), then OLAP functions can help achieve
it without need to build additional indexes. OLAP is perfect for
detecting changes with relatively complex specifications. Detecting
the type change (which column changed) that occurred between
successive history rows is possible too, using OLAP with self joins
or OLAP only with moving window specs.
DB2 Application Performance Specialist
CPT Global Ltd