Regarding DENSE_RANK() OVER(Partition and order by)

Nanthakumar Yoganathan

Regarding DENSE_RANK() OVER(Partition and order by)

Hi Team,

I have a confusion over DENSE_RANK()..

I have a query  DENSE_RANK()  with PARTITION_BY and ORDER BY... In this case,  the rank number always based on PARTITION_BY or ORDER BY...

In precise, How DENSE_RANK() react if it has PARTITION_BY and ORDER BY together.

Regards,

Nanthakumar Yoganathan

+91-9942512341

alain pary

RE: Regarding DENSE_RANK() OVER(Partition and order by)
(in response to Nanthakumar Yoganathan)

Hello , 

here is an example 

SELECT DISTINCT RANK() OVER (
ORDER BY CO.TBCREATOR, CO.TBNAME, CO.NAME) AS STMT_ID,
CO.TBCREATOR,
CO.TBNAME,
CO.NAME AS COLNAME
FROM SYSIBM.SYSCOLUMNS CO

 

if you submit it the stmt_id will go from 1 to the number of row 

 

if you modify the query 

SELECT DISTINCT RANK() OVER (PARTITION BY CO.TBCREATOR
ORDER BY
CO.TBNAME,
CO.NAME) AS STMT_ID,
CO.TBCREATOR,
CO.TBNAME,
CO.NAME AS COLNAME
FROM SYSIBM.SYSCOLUMNS CO

at each new creator , you start at 1 until the end of the rows with the same creator.

I hope it help 

 

regards 

 

Alain

James Campbell

Regarding DENSE_RANK() OVER(Partition and order by)
(in response to alain pary)
Noting, of course, that DENSE_RANK numbers always increase by 1 (ie no holes), so the
maximum number is not the number of rows, but the number of distinct values of the 'order
by' columns within each partition.

As there should not be duplicate (CO.TBCREATOR, CO.TBNAME, CO.NAME) values, this
wouldn't show up in Alain's examples, but it would with
RANK() OVER (PARTITION BY CO.TBCREATOR
ORDER BY CO.TBNAME) AS TABLE_ID
and
DENSE_RANK() OVER(....

James Campbell


On 19 Sep 2018 at 5:01, alain pary wrote:

>
> Hello , 
> here is an example 
> SELECT DISTINCT RANK() OVER (
> ORDER BY CO.TBCREATOR, CO.TBNAME, CO.NAME) AS STMT_ID,
> CO.TBCREATOR,
> CO.TBNAME,
> CO.NAME AS COLNAME
> FROM SYSIBM.SYSCOLUMNS CO
>  
> if you submit it the stmt_id will go from 1 to the number of row 
>  
> if you modify the query 
> SELECT DISTINCT RANK() OVER (PARTITION BY CO.TBCREATOR
> ORDER BY
> CO.TBNAME,
> CO.NAME) AS STMT_ID,
> CO.TBCREATOR,
> CO.TBNAME,
> CO.NAME AS COLNAME
> FROM SYSIBM.SYSCOLUMNS CO
> at each new creator , you start at 1 until the end of the rows with the same creator.
> I hope it help 
>  
> regards 
>  
> Alain
>


---
This email has been checked for viruses by AVG.
https://www.avg.com

Michael Hannan

RE: Regarding DENSE_RANK() OVER(Partition and order by)
(in response to Nanthakumar Yoganathan)

I use DENSE_RANK a lot with both PARTITION BY and ORDER BY clauses.

PARTITION BY is like a GROUP BY in some ways but does not summarize the rows. Every change in the PARTITION BY Column values set, causes the DENSE_RANK row numbers to restart at 1 again. So we are numbering within each the set of rows with a common value set for the PARTITION BY columns.

Within that Group, the ORDER BY specifies what sequence to sort the individual rows (duplicate on the PARTITION BY key). Dense Rank then attaches sequential numbers starting at 1 and incrementing by 1 (with no gaps), such that if multiple sorted rows within  each group value have the same ORDER BY values (are duplicates on ORDER BY key), then they get the same DENSE_RANK number. ROW_NUMBER function would have given each sorted row a different number (within each group PARTITION BY key value) regardless of having the same value of the ORDER BY key.

I think RANK will be used a lot less, as it also gives duplicate rows the same rank, but then skips some potential generated sequential numbers to compensate for earlier duplicates in the ORDER BY. i.e. could have sequence numbers: 1, 1, 1, 4, 4, 6, 7, 8, 8, 10, etc.

Other Aggregating functions are available, e.g. to calculate Max or Sum of a group but retaining the detail rows. Unfortunately, can do COUNT but not COUNT(DISTINCT col)  on z/OS. Small weakness. Would have caused too much complexity of the sort requirements. Can take MIN and MAX and later compare them to determined if a value changed. OLAP avoids need for self join when we want the detail and the Summary as well. Very handy alternative to Grouping Sets (GROUP BY with ROLLUP), because we don't have to code every single column in a GROUP BY to retain the detail rows.

These OLAP functions/specifications are applied to the result set of any GROUP BY clause in the SQL subquery, so that means an Aggregate function can appear in the PARTITION BY and ORDER  BY clauses, but we cannot use Aggregate function on the result of the OLAP spec directly. To do that need another subquery.

On other hand, a SELECT DISTINCT eliminates duplicates after the OLAP functions have already been applied to the result rows. If use a ROW_NUMBER function, should not get any duplicates unless PARTITION BY columns are not selected in the result set.

Hope I have covered it in great detail. However there is a lot more functionality by specifying limited moving windows that I have omitted this time, for fear of getting too complex, and being a tangent.

OLAP specs are extremely powerful to do things with less SQL than we used to have to write, and originally we had to do horrible things to count the number of rows preceding each result row. In the old days, before recursion was available, some methods for this performed quite badly. Having the RDS Sort do it for us is massive. Means recursive SQLs merely to count rows, are not needed.

I use OLAP functions to assign seq numbers to aspects of an Access Path set of rows in the PLAN_TABLE, and can use them to detect access path changes from one REBIND to the next, by sorting on access path columns, with a much simpler SQL than we had to code with outer Joins or NOT EXISTS subqueries in the past. Not only, that but the performance is vastly better when perhaps a PLAN_TABLE did not have the best set of indexes to support NOT EXISTS subquery probes. If one wants a very smart clean up of PLAN_TABLE where delete multiple old successive REBINDs that resulted in same access paths, but to keep history of access path change for say one year (without all the duplicates), then OLAP functions can help achieve it without need to build additional indexes. OLAP is perfect for detecting changes with relatively complex specifications. Detecting the type change (which column changed) that occurred between successive history rows is possible too, using OLAP with self joins or OLAP only with moving window specs.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Sep 20, 2018 - 08:26 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 20, 2018 - 08:27 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 20, 2018 - 08:33 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 20, 2018 - 09:04 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 20, 2018 - 09:08 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 20, 2018 - 09:12 AM (Europe/Berlin)