Flag consecutive instances of a given value

Ran Mo

Flag consecutive instances of a given value

Hello,

I need to flag records with a given value showing up consecutively by an ID. How can I get this with DB2 SQL? Thanks.

 

MR


Obs    id    name

  1     1     a
  2     1     a
  3     2     a
  4     2     b
  5     2     b
  6     2     b
  7     2     c
  8     3     b
  9     3     c
 10     3     b
 11     4     a
 12     4     b


Obs    id    name      flag

  1     1     a          .
  2     1     a          .
  3     2     a          .
  4     2     b          1
  5     2     b          1
  6     2     b          1
  7     2     c          .
  8     3     b          1
  9     3     b          1
 10     3     c          .
 11     4     a          .
 12     4     b          .

 

Walter Janißen

AW: Flag consecutive instances of a given value
(in response to Ran Mo)
Hi

What about RANK() OVER (PARTITION BY id, name). rows with the same RANK-value can be flagged.

Kind regards
Walter Janißen [standard_IBM+Champ+7+Yr+Analytics]

ITERGO Informationstechnologie GmbH
Anwendungsentwicklung
Technische Anwendungsarchitektur
Victoriaplatz 2
D-40198 Düsseldorf
[login to unmask email]<mailto:[login to unmask email]>

ITERGO Informationstechnologie GmbH
Vorsitzender des Aufsichtsrats: Christian Diedrich
Geschäftsführung: Dr. Bettina Anders (Vorsitzende),
Lothar Engelke, Ina Kirchhof, Dr. Michael Regauer
Sitz: Düsseldorf, Handelsregister: Amtsgericht Düsseldorf HRB 37996

Von: Ran Mo [mailto:[login to unmask email]
Gesendet: Mittwoch, 29. November 2017 16:45
An: [login to unmask email]
Betreff: [DB2-L] - Flag consecutive instances of a given value


Hello,

I need to flag records with a given value showing up consecutively by an ID. How can I get this with DB2 SQL? Thanks.



MR

Obs id name

1 1 a
2 1 a
3 2 a
4 2 b
5 2 b
6 2 b
7 2 c
8 3 b
9 3 c
10 3 b
11 4 a
12 4 b

Obs id name flag

1 1 a .
2 1 a .
3 2 a .
4 2 b 1
5 2 b 1
6 2 b 1
7 2 c .
8 3 b 1
9 3 b 1
10 3 c .
11 4 a .
12 4 b .



-----End Original Message-----
Attachments

  • image001.png (2.6k)

Ran Mo

RE: AW: Flag consecutive instances of a given value
(in response to Walter Janißen)

Forgot to mention. The search is for a given value ONLY. In this case, Name = 'b'. Consecutive instances of all other values should not be flagged.

Thanks.

Sam Baugh

RE: AW: Flag consecutive instances of a given value
(in response to Ran Mo)

This should help get you started, must join table back to itself:

with test_tbl
  as (select  1 as id, 'a' as name from sysibm.sysdummy1 union all
      select  1 as id, 'a' as name from sysibm.sysdummy1 union all
      select  2 as id, 'a' as name from sysibm.sysdummy1 union all
      select  2 as id, 'b' as name from sysibm.sysdummy1 union all
      select  2 as id, 'b' as name from sysibm.sysdummy1 union all
      select  2 as id, 'b' as name from sysibm.sysdummy1 union all
      select  2 as id, 'c' as name from sysibm.sysdummy1 union all
      select  3 as id, 'b' as name from sysibm.sysdummy1 union all
      select  3 as id, 'c' as name from sysibm.sysdummy1 union all
      select  3 as id, 'b' as name from sysibm.sysdummy1 union all
      select  4 as id, 'a' as name from sysibm.sysdummy1 union all
      select  4 as id, 'b' as name from sysibm.sysdummy1
     )
select a.*
     , case when b.id is null then '' else '1' end as flag
  from test_tbl a
  left join (
             select id, name, count(*)
               from test_tbl
              where name = 'b'
              group by id, name having count(*) > 1
            ) as b
    on b.id   = a.id
   and b.name = a.name
  with ur

Ran Mo

RE: AW: Flag consecutive instances of a given value
(in response to Sam Baugh)

Hi,

Thanks very much for the attempt. Yet it does not work with my data which has 3.5 million rows of unique id. I can't do the first part in the suggested way.

I am sure it can be done. I have strong SAS background. In SAS, it is simply a first.id and last.id data step. I am new with DB2 SQL. Just try to find the way to run similar query in SQL.

 

Philip Sevetson

AW: Flag consecutive instances of a given value
(in response to Ran Mo)
Is it possible to have non-consecutive instances within your grouping, and are those of concern to you, or do you want to ignore them?

DB2 tables are conceptually un-ordered, so if RANK or GROUP BY don’t work for you, you probably have to UNLOAD the data and process it as a sequential file using a programming language.

From: Ran Mo [mailto:[login to unmask email]
Sent: Wednesday, November 29, 2017 2:00 PM
To: [login to unmask email]
Subject: [DB2-L] - RE: AW: Flag consecutive instances of a given value


Forgot to mention. The search is for a given value ONLY. In this case, Name = 'b'. Consecutive instances of all other values should not be flagged.

Thanks.

-----End Original Message-----
**This e-mail, including any attachments, may be confidential, privileged, or otherwise legally protected. It is intended only for the addressee. If you received this e-mail in error or from someone who was not authorized to send it to you, do not disseminate, copy, or otherwise use this e-mail or its attachments. Please notify the sender immediately by reply e-mail and delete the e-mail from your system.**

Peter Vanroose

Re: Flag consecutive instances of a given value
(in response to Philip Sevetson)

Two other solutions:

- Write an SQL PL stored procedure which opens a cursor on your query with an appropriate ORDER BY, keeps track of the group you are in while FETCHing the cursor, and returns the wanted result through a result set (using a global temporary table)

- Use RANK() OVER (same ORDER BY, but additionally a PARTITION BY).

In Reply to Philip Sevetson:

DB2 tables are conceptually un-ordered, so if RANK or GROUP BY don’t work for you, you probably have to UNLOAD the data and process it as a sequential file using a programming language.

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

Peter Vanroose

Re: Flag consecutive instances of a given value
(in response to Ran Mo)

Here are three queries, one with GROUP BY and (self)JOIN, the other tw owith an aggregation OLAP function (not RANK() OVER, but instead COUNT(*) OVER).

I expect the third one to have the best performance since its memory requirements are lowest: just look ahead & behind a single row. The second query looks at the full group so it's very close to the first query, which on the other hand most likely will materialize the full CTE.

1:

WITH gb AS (SELECT id AS gid, 1 AS flag FROM MR
            WHERE name = 'b'
            GROUP BY id, name
            HAVING COUNT(*) > 1)
SELECT obs, id, name, flag
FROM   MR left join gb ON name='b' AND id=gid;

2:

SELECT obs,id,name,
       CASE WHEN name='b' AND COUNT(*) OVER (PARTITION BY id,name) > 1
            THEN 1
       END as flag
FROM MR;

3:

SELECT obs,id,name,
       CASE WHEN name='b' AND COUNT(*) OVER (PARTITION BY id,name ORDER BY 1 ROWS BETWEEN 1 preceding AND 1 following) > 1
            THEN 1
       END as flag
FROM MR;

P.S. I created the table as follows:

create table MR(obs int not null primary key, id int not null, name char(1) not null);
insert into MR values( 1,1,'a');
insert into MR values( 2,1,'a');
insert into MR values( 3,2,'a');
insert into MR values( 4,2,'b');
insert into MR values( 5,2,'b');
insert into MR values( 6,2,'b');
insert into MR values( 7,2,'c');
insert into MR values( 8,3,'b');
insert into MR values( 9,3,'c');
insert into MR values(10,3,'b');
insert into MR values(11,4,'a');
insert into MR values(12,4,'b');

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

In Reply to Ran Mo:

I need to flag records with a given value showing up consecutively by an ID. How can I get this with DB2 SQL?

Only names equal to 'b' need to be flagged.

MR


Obs    id    name

  1     1     a
  2     1     a
  3     2     a
  4     2     b
  5     2     b
  6     2     b
  7     2     c
  8     3     b
  9     3     c
 10     3     b
 11     4     a
 12     4     b


Obs    id    name      flag

  1     1     a          .
  2     1     a          .
  3     2     a          .
  4     2     b          1
  5     2     b          1
  6     2     b          1
  7     2     c          .
  8     3     b          1
  9     3     b          1
 10     3     c          .
 11     4     a          .
 12     4     b          .