Sliding window in a query

Ron van der Poel

Sliding window in a query

Hai,

a question.
I have two tables. DB2/zOS version 11

Contract-table, containing three attributes:
- contract-id
- sliding-window-number-of-weeks
- sliding-window-limit-amount

Transaction-table, , containing three attributes:
- date
- contract-id
- amount

Suppose we have in the contract-table a row with the next values:
- contract-id = 'AS123'
- sliding-window-number-of-weeks = 4
- sliding-window-limit-amount = 100.000 euro

So, this means that contract-id 'AS123' can spend 100.000 euro in 4 weeks time.

In the transaction table, we have two rows:
- date = 12-09-2018
- contract-id 'AS123'
- amount = 40.000 euro

- date = 18-09-2018
- contract-id 'AS123'
- amount = 60.000 euro

The weeks are sliding. We want to see the percentage used of the sliding-window-limit-amount of the contract.
Suppose the first week starts at 09-09-2018.

Then we get the next output:
Limit used in week 1: 40% (40.000 of 100.000)
Limit used in week 2: 100% (40.000 of week 1 + 60.000 of week 2 equals 100.000 limit of contract)
Limit used in week 3: 100% (sliding-weeks are 4)
Limit used in week 4: 100% (sliding-weeks are 4)
Limit used in week 5: 60% (sliding-weeks are 4, so here we look at week 2 up until 5. In week 2 60.000 euro is spent, is 60% of the limit)
Limit used in week 6: 0% (sliding-weeks are 4, so here we look at week 3 up until 6. In these weeks nothing is spent, so 0% of the limit of contract)

How can I get the percentages in a single query? Is there a function in DB2 z/OS for this?

Thanks
Ron

Sam Baugh

Sliding window in a query
(in response to Ron van der Poel)
perhaps something like the following would work, no guarantees:

select a.contract_id
, a.sliding_weeks
, a.sliding_limit
, sum(value(b.amount,0)) as amount
, sum(value(b.amount,0))/float(a.sliding_limit)*100.00 as limit_pct
from contract_table a
left join tran_table b
on b.contract_id = a.contract_id
and b.date between cast(? as date) and cast(? as date) + a.sliding weeks
group by a.contract_id
, a.sliding_weeks
, a.sliding_limit

On Thu, Sep 13, 2018 at 2:33 PM Ron van der Poel <[login to unmask email]>
wrote:

> Hai,
>
> a question.
> I have two tables. DB2/zOS version 11
>
> Contract-table, containing three attributes:
> - contract-id
> - sliding-window-number-of-weeks
> - sliding-window-limit-amount
>
> Transaction-table, , containing three attributes:
> - date
> - contract-id
> - amount
>
> Suppose we have in the contract-table a row with the next values:
> - contract-id = 'AS123'
> - sliding-window-number-of-weeks = 4
> - sliding-window-limit-amount = 100.000 euro
>
> So, this means that contract-id 'AS123' can spend 100.000 euro in 4 weeks
> time.
>
> In the transaction table, we have two rows:
> - date = 12-09-2018
> - contract-id 'AS123'
> - amount = 40.000 euro
>
> - date = 18-09-2018
> - contract-id 'AS123'
> - amount = 60.000 euro
>
> The weeks are sliding. We want to see the percentage used of the
> sliding-window-limit-amount of the contract.
> Suppose the first week starts at 09-09-2018.
>
> Then we get the next output:
> Limit used in week 1: 40% (40.000 of 100.000)
> Limit used in week 2: 100% (40.000 of week 1 + 60.000 of week 2 equals
> 100.000 limit of contract)
> Limit used in week 3: 100% (sliding-weeks are 4)
> Limit used in week 4: 100% (sliding-weeks are 4)
> Limit used in week 5: 60% (sliding-weeks are 4, so here we look at week 2
> up until 5. In week 2 60.000 euro is spent, is 60% of the limit)
> Limit used in week 6: 0% (sliding-weeks are 4, so here we look at week 3
> up until 6. In these weeks nothing is spent, so 0% of the limit of contract)
>
> How can I get the percentages in a single query? Is there a function in
> DB2 z/OS for this?
>
> Thanks
> Ron
>
>
> -----End Original Message-----
>

Michael Hannan

RE: Sliding window in a query
(in response to Sam Baugh)

In Reply to Sam Baugh:

perhaps something like the following would work, no guarantees:

select a.contract_id
, a.sliding_weeks
, a.sliding_limit
, sum(value(b.amount,0)) as amount
, sum(value(b.amount,0))/float(a.sliding_limit)*100.00 as limit_pct
from contract_table a
left join tran_table b
on b.contract_id = a.contract_id
and b.date between cast(? as date) and cast(? as date) + a.sliding weeks
group by a.contract_id
, a.sliding_weeks
, a.sliding_limit

Ron,

I liked Sam's solution, however I thought for fun to try using OLAP spec. but ran into the problem that an OLAP RANGE window spec requires use of a constant, where you wanted a variable range depending on the  contract_id. This means the OLAP solution would be more complex.

Still would be nice to be able to see the calculated percentage as at any date a row was inserted, rather than being an offset from a fixed parameter date. So I might alter Sam's SQL a little.

Select a.contract_id
, a.sliding_weeks
, a.sliding_limit
, B.date
, B.amount
, (SELECT SUM(FLOAT(b2.amount))/a.sliding_limit * 100  as Used_PCT
    FROM    tran_table b2
    WHERE b2.contract_id = a.contract_id
     AND      b2.date between B.date - (7 * a.sliding_weeks - 1) DAYS   AND   B.DATE
)
from contract_table a

left join tran_table b
on b.contract_id = a.contract_id

-- Optional predicates on tables
WHERE a. CONTRACT_id = ?    

 

I had to use a self join due to limitations of OLAP functions not allowing a variable range a.sliding_weeks directly.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Sep 20, 2018 - 09:50 AM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 20, 2018 - 09:53 AM (Europe/Berlin)

Peter Vanroose

Re: Sliding window in a query
(in response to Michael Hannan)

I guess you could circumvent this limitation by technically using the OLAP windowing function with an "unbounded preceding" window, but combining that with a CASE expression in the windowed SUM(), which would essentially multiply the value with 1 for entries within the variable window, and with 0 for values outside that window.

Something like:

  SUM(b.amount * CASE WHEN b.date BETWEEN ... AND ... THEN 1 ELSE 0 END)
OVER (ORDER BY b.date ROWS UNBOUNDED PRECEDING)



In Reply to Michael Hannan:

[...] I liked Sam's solution, however I thought for fun to try using OLAP spec. but ran into the problem that an OLAP RANGE window spec requires use of a constant, where you wanted a variable range depending on the  contract_id [...]

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        https://www.abis.be/

Edited By:
Peter Vanroose[Organization Members] @ Sep 24, 2018 - 08:59 PM (Europe/Brussels)

Michael Hannan

Re: Sliding window in a query
(in response to Peter Vanroose)



In Reply to Peter Vanroose:

I guess you could circumvent this limitation by technically using the OLAP windowing function with an "unbounded preceding" window, but combining that with a CASE expression in the windowed SUM(), which would essentially multiply the value with 1 for entries within the variable window, and with 0 for values outside that window.

Something like:

  SUM(b.amount * CASE WHEN b.date BETWEEN ... AND ... THEN 1 ELSE 0 END)
OVER (ORDER BY b.date ROWS UNBOUNDED PRECEDING)


 

Prove me wrong Peter,
but I don't think that works unless you are looking for a single result row (specific date to end the window), like in the previous solution. Can't see how to avoid joining table to itself here. My solution tries to get the ongoing result  for multiple rows (multiple dates) (if it works). Maybe you did not notice that was the point of my solution.

In OLAP you specify a Partition to operate on and then can aggregate rows in the partition. You cannot directly do something that compares a row with other rows in the same partition to get the date difference, before deciding if contributes to the SUM or not. You might need a more complex multi step solution.

Your OLAP looks to be wanting a Cartesian product. Test it out though and prove me wrong. I am not saying its impossible but just not as simple as you suppose.

DB2 has to know how it is going to get the right rows to be summed, so that is why the bounds are not so dynamically variable.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Sep 26, 2018 - 01:17 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 26, 2018 - 01:20 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 26, 2018 - 01:23 PM (Europe/Berlin)

Peter Vanroose

Re: Sliding window in a query
(in response to Michael Hannan)

Challenge accepted!

I'll have a closer look at it when I find a few minutes of spare time ;-)

I'm still believing it's possible without an extra self-join.

In Reply to Michael Hannan:

[...]  I don't think that works unless you are looking for a single result row (specific date to end the window), like in the previous solution. Can't see how to avoid joining table to itself here. [...]

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        https://www.abis.be/

Michael Hannan

Re: Sliding window in a query
(in response to Peter Vanroose)



In Reply to Peter Vanroose:

Challenge accepted!

I'll have a closer look at it when I find a few minutes of spare time ;-)

I'm still believing it's possible without an extra self-join.

Well that is a tough challenge. Imagine how the OLAP sliding window works. As it visits each successive row, it adds current row to the aggregated data, and it also has to take away aggregated data that has dropped out of the sliding window, I presume by re referencing the row to be taken out or by having two concurrent tranversers of the rows. So it has to know without too much complexity what is to be taken out of the aggregate.

Currently it knows to use a specific constant offset number of rows or specific constant value difference from the current row. The 2nd way is good except for the restriction that the window has to be a constant, not a variable window supplied from another table column. I guess the method does not want the possibility of trying to provide for a variable sized window.

Would be nice if they merely relaxed that to say the window should be fixed offset for each PARTITION BY value set.

I think maybe OLAP can  get around joining the table to itself, however not with a great performing solution I think. Possible using more than 1 OLAP.

When dealing with a big table that does not have suitable indexes, for desired self joins,  then multiple levels of Nested OLAPs can work well. I coded one to detect successive duplicate access paths from REBINDs in PLAN_TABLE for a cleanup SQL, to enable me to get rid of old access paths in a certain time range when they were successive duplicates only, without needing a correlated subquery or join to itself, just in case a site does not have the indexes I would like.

My attempt, had to use 2 OLAPs (I think 1 is not possible), nested CTEs and a UNION ALL, in order to get out of self join. Not worth it unless the right indexes are missing, I think. I will be impressed if you can achieve it much simpler or with something that works better than my attempt (I can't be sure it works with no test - UNBOUNDED PRECEDING is the default):

WITH CTE1 AS
(Select a.contract_id
 , a.sliding_weeks
 , a.sliding_limit
 , B.date
 , B.amount
 ,SUM(FLOAT(b2.amount)) OVER(
    PARTITION BY a.CONTRACT_id
    ORDER BY B.date) AS ACCUM1
 from contract_table a

left join tran_table b
 on b.contract_id = a.contract_id

-- Optional predicates on tables
 WHERE a. CONTRACT_id = ?   
)
,CTE2 AS
(SELECT 0 TYPE
       ,contract_id
       ,sliding_weeks
       ,sliding_limit
       ,date  DT
       ,amount
       ,ACCUM1
   FROM CTE1   
 UNION ALL      
 SELECT 1 TYPE  -- Rows for subtraction
       ,contract_id
       ,sliding_weeks
       ,sliding_limit
       ,date + (sliding_weeks*7) DAYS AS DT
       ,amount
       ,ACCUM1    
   FROM CTE1   
)
,CTE3 AS
(SELECT TYPE
       ,contract_id
       ,sliding_weeks
       ,sliding_limit
       ,DT
       ,amount
       ,ACCUM1 
-- Find max amount in Type 1 rows to be substracted from Type 0 rows.
-- Relies on substracted rows to be sorted to be just before the row substracted from

      ,MAX(TYPE*ACCUM1) OVER( 
          PARTITION BY CONTRACT_id
          ORDER BY DT, TYPE DESC) TO_SUBTRACT
)
SELECT contract_id
       ,sliding_weeks
       ,sliding_limit
       ,DT
       ,amount
       ,(ACCUM1 - TO_SUBTRACT)/sliding_limit*100 AS PERCENT
  FROM  CTE3    
 WHERE  TYPE = 0  -- avoid the subtraction rows in results.
 ;               

Note that CTE1 is materialised to a work file, since referenced twice, and probably for the OLAP anyway.

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Sep 29, 2018 - 12:59 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 29, 2018 - 01:49 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 29, 2018 - 01:50 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Sep 29, 2018 - 01:56 PM (Europe/Berlin)

Peter Vanroose

Re: Sliding window in a query
(in response to Michael Hannan)

Here it is: no need for a self-join, only a "self-union" of the tran_table: shifted by the nr of weeks specified in the contract table (and with negative amount, to cancel the amount from that point on in the running sum).

I've chosen to show percentages per day (instead of per week).
In both cases, the query requires a table with dates (or weeks) in the time span required.
In my query that's the recursive CTE called "dates" which produces dates from 1 Sept to 31 Dec. 2018. (Easy to adapt to other time spans or to weeks instead of days, of course.)

WITH dates(d) AS (SELECT CAST('2018-09-01' AS date) FROM sysibm.sysdummy1
UNION ALL
SELECT d + 1 DAY FROM dates WHERE d < '31.12.2018')
SELECT d, SUM(pct) OVER (order by d rows between unbounded preceding and current row)
FROM dates LEFT JOIN
(SELECT t.date AS dt, t.amount/c.limit_amount AS pct
FROM tran_table t JOIN contract_table c ON c.contract_id=t.contract_id
UNION ALL
SELECT t.date + (7*nr_weeks) DAYS AS dt, -t.amount/c.limit_amount AS pct
FROM tran_table t JOIN contract_table c ON c.contract_id=t.contract_id
) x ON d = dt
ORDER BY d

Providing for multiple contract_id's poses an extra challenge;
essentially, a PARTITION BY contract_id will be needed inside the 2nd argument of SUM (just before ORDER BY),
but it's a bit more complicated than that...

In terms of performance, the joins (to the contract table) could maybe be optimized a bit, but since that's just a 1-row table (or a small one, in case of multiple contract_id's) it does not matter too much I believe.


--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        https://www.abis.be/

Michael Hannan

Re: Sliding window in a query
(in response to Peter Vanroose)



In Reply to Peter Vanroose:

 

WITH dates(d) AS (SELECT CAST('2018-09-01' AS date) FROM sysibm.sysdummy1
UNION ALL
SELECT d + 1 DAY FROM dates WHERE d < '31.12.2018')
SELECT d, SUM(pct) OVER (order by d rows between unbounded preceding and current row)
FROM dates LEFT JOIN
(SELECT t.date AS dt, t.amount/c.limit_amount AS pct
FROM tran_table t JOIN contract_table c ON c.contract_id=t.contract_id
UNION ALL
SELECT t.date + (7*nr_weeks) DAYS AS dt, -t.amount/c.limit_amount AS pct
FROM tran_table t JOIN contract_table c ON c.contract_id=t.contract_id
) x ON d = dt
ORDER BY d

Providing for multiple contract_id's poses an extra challenge;
essentially, a PARTITION BY contract_id will be needed inside the 2nd argument of SUM (just before ORDER BY),
but it's a bit more complicated than that...

In terms of performance, the joins (to the contract table) could maybe be optimized a bit, but since that's just a 1-row table (or a small one, in case of multiple contract_id's) it does not matter too much I believe.

 

Very well done! Glad to learn a new technique and I am impressed. Yes you could have easily partitioned by Contract_id, so take that as a given. Not sure joining to dates was really needed. I can see you wanted the percentage though at any date, not just when a new input occurred. I am not too sure how a recursive loop performs like this. 

We can see that UNION (ALL) with aggregation can often avoid a self join. Overhead of every joined row is significant so in some queries the UNION with aggregate may even perform better. Testing required some day.
The OLAP function even allows a 1 to many type join to be replaced.

Table gets scanned twice so another option is to scan once and do Union on an intermediate result WF. 

I think the UNION ALL can be eliminated too by join to a dummy two row table to effect Union All to itself (still with double scan of input unless force materilaise it), but not sure if performs any better. Any join has a cost. This may not be better than original Table self join, and UNION ALL may out perform it. Depends a lot on access paths, materialised work files etc., and best of all, performance tests when unsure.

I omitted join to Dates as I don't see we need a row for every date. I also added predicates to specify oldest date of interest to avoid summing years of history if not wanted. Further predicate needed if want to omit the subtract rows

The bottom line is OLAP is more powerful than most of us think (I thought this was not quite possible), and could be an alternative to joins to summarised data. Maybe some generalised techniques can be formulated to overcome OLAP function restrictions. However maybe in the end we need a version that is understood easily. A query could be incomprehensible. LOL

With TWO as (select DISTINCT colno - 1 AS row_ind from SYSIBM.SYScolumns
where tbcreator = 'SYSIBM' and tbname = 'SYSKEYS'
and colno < 3 )
,contract_Tran as
(SELECT
t.date AS dt1

, t.amount , c.limit_amount , nr_weeks
, t.amount/c.limit_amount*100 AS pct
, row_ind
, 1 - 2*row_ind as sgn
FROM tran_table t
JOIN contract_table c
ON c.contract_id=t.contract_id
JOIN TWO t2 on t2.row_ind >= 0
WHERE contract_id = ?
AND t.date >= DATE(:HVdt) - (7*nr_weeks) days
AND t.date <= CURRENT DATE - ((row_ind)*7*nr_weeks) days

)
,AGG AS
(select contract_id, Limit_amount, sgn*t.amount as change, Dt1

,Dt1 + (row_ind*7*nr_weeks) days as DT
,SUM(sgn*pct) OVER (partition by contract_id
order by Dt1 + (row_ind*7*nr_weeks) days
rows between unbounded preceding and current row
) AS slide_pct
)
SELECT * FROM AGG
WHERE DT >= :HVDT
order by dt desc, row_ind desc
Fetch first 100 rows only
with ur;

Michael Hannan,
DB2 Application Performance Specialist
CPT Global Ltd

Edited By:
Michael Hannan[Organization Members] @ Oct 02, 2018 - 12:11 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Oct 02, 2018 - 12:28 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Oct 02, 2018 - 12:30 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Oct 02, 2018 - 01:20 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Oct 02, 2018 - 01:28 PM (Europe/Berlin)
Michael Hannan[Organization Members] @ Oct 02, 2018 - 01:35 PM (Europe/Berlin)