Finding duplicate values in column and then Increment a pesudo COUNT column by 1

Jim Aloye

Finding duplicate values in column and then Increment a pesudo COUNT column by 1

I am hoping someone can provide some insight on the following problem, specifically if it can be solved using an AD-HOC Query or similar structure.

If it can be solved via an AD-HOC Query or similar structure, I would appreciate any advice, suggestions, or recommendations on how I might approach writing the query.

 

Here is the problem:

 

I am using DB2 for iSeries version V7R2M0

 

I have a table with approximately 100K records.

 

I have a query that combines one column from one table (we will call this table FOLSOM.DYNACRM002) with two columns from another table (we will call this table FOLSOM.BFP002) via a JOIN.

 

The columns I am concerned with are as follows:

 

TABLENAME                                       FIELD NAME          TYPE

FOLSOM.DYNACRM002                    IDNUM                      VARCHAR

FOLSOM.BFP002                               PDATE                      NUMERIC

FOLSOM.BFP002                               PTIME                       NUMERIC

 

Many times, there will only be one row where a given IDNUM exists.

 

However, the more likely result is that there can be anywhere from 2 to several thousand rows that contain the SAME IDNUM value, but the corresponding columns of PDATE and PTIME are different.

 

“Somehow” and this is the problem – somehow, I need to build into the query itself, an extra ‘fake column’ named IDNUM_SEQ that stores a sequence number that ‘appears’ based on the ‘rules’ described below.

 

The way this sequence number needs to work is that if there is only one row in the returned data that has a given IDNUM the IDNUM_SEQ column (pseudo COUNT COLUMN) will contain a 1

 

However, when multiple rows are returned that contain the same IDNUM (but different PDATE and PTIME values obviously), the topmost row gets a 1 in the IDNUM_SEQ column (pseudo COUNT COLUMN) , followed by a 2 in the next one, and a 3 in the next one, and so forth.

 

If I was able to achieve my goal and solve this problem, an example of what the returned data would look like complete with sequence numbers can be seen in the image attached below named DB2_Data_Sample_Annotated.png  (pardon my inability to center properly right justify the numbers I manually added using an art/drawing app).

 

The ad-query I am using currently that returns the data I am discussing here is listed below

 

DELETE FROM FOLSOM.DYNACRM004

;

INSERT INTO FOLSOM.DYNACRM004 (

SELECT DISTINCT

RTRIM(AACPCMS) AS AACPCMS

,AACSCD

,AACSYR

,AACSSQ

,TRIM(AAMATTR) AS AAMATTR

FROM XTERM.CPP100 CRIMAS

)

;

SELECT

TRIM(CRIMAS.AAMATTR) AS IDNUM

, ' ' AS IDNUM_SEQ

,FOFMAS.ABPYDT AS PDATE

,FOFMAS.ABADTM AS PTIME

FROM FOLSOM.BFP002 FOFMAS

INNER JOIN FOLSOM.DYNACRM004 CRIMAS ON CRIMAS.AACSCD = FOFMAS.ABCSCC

AND CRIMAS.AACSYR = FOFMAS.ABCSYR

AND CRIMAS.AACSSQ = FOFMAS.ABCSSQ

WHERE FOFMAS.ABCSCC <> 9

AND FOFMAS.ABCSSQ <> 99999

AND (FOFMAS.ABCSSQ <> 0 OR FOFMAS.ABFSQ# <> 0)

ORDER BY TRIM(CRIMAS.AAMATTR) ASC, FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC

;

Hopefully what I am looking to do makes sense.

 

My questions are:

 

  • Can this be done via an AD-HOC query or some other AD-HOC structure I can run in the i5 native ‘run a sql script or query – tool’ or via DBArtisan or some other query tool. Even if this query must be broken into two, queries, and occur across two steps, making use of a ‘staging or holding table’, that is acceptable.

 

  • If the answer to the above question is YES, how might I approach writing the query or set of queries w/ staging or holding table to do this?

 

I am a beginner/almost-intermediate DB2 user so I can create tables and insert/update data, write queries without a problem. I just need some help getting determining how to approach this problem.

 

The most I could find about this was an article in StackOverflow that deals with Microsoft SQL but not DB2 which I posted below - https://stackoverflow.com/questions/14411941/on-finding-duplicate-values-in-column-of-sql-table-increment-a-count-column-by 

 

Any guidance, advice, suggestions, or even help applying what's in that stack overflow article to the DB2 problem described above would be greatly appreciated.

Attachments

  • DB2_Data_Sample_Annotated.png (40.1k)
Edited By:
Jim Aloye[Organization Members] @ Aug 01, 2017 - 11:19 AM (America/Eastern)
Jim Aloye[Organization Members] @ Aug 01, 2017 - 11:20 AM (America/Eastern)
Jim Aloye[Organization Members] @ Aug 01, 2017 - 11:57 AM (America/Eastern)
Jim Aloye[Organization Members] @ Aug 01, 2017 - 11:58 AM (America/Eastern)
Jim Aloye[Organization Members] @ Aug 01, 2017 - 12:28 PM (America/Eastern)

Jim Aloye

RE: Finding duplicate values in column and then Increment a pesudo COUNT column by 1
(in response to Jim Aloye)

SOLVED MY OWN PROBLEM!

 

Here is the line that did it:

ROW_NUMBER() OVER (PARTITION BY CRIMAS.AAMATTR ORDER BY FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC) AS ROWNUM

 

See in query below:

DELETE FROM FOLSOM.DYNACRM004

;

INSERT INTO FOLSOM.DYNACRM004 (

SELECT DISTINCT

RTRIM(AACPCMS) AS AACPCMS

,AACSCD

,AACSYR

,AACSSQ

,TRIM(AAMATTR) AS AAMATTR

FROM XTERM.CPP100 CRIMAS

)

;

SELECT

TRIM(CRIMAS.AAMATTR) AS IDNUM

,ROW_NUMBER() OVER (PARTITION BY CRIMAS.AAMATTR ORDER BY FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC) AS ROWNUM

,FOFMAS.ABPYDT AS PDATE

,FOFMAS.ABADTM AS PTIME

FROM FOLSOM.BFP002 FOFMAS

INNER JOIN FOLSOM.DYNACRM004 CRIMAS ON CRIMAS.AACSCD = FOFMAS.ABCSCC

AND CRIMAS.AACSYR = FOFMAS.ABCSYR

AND CRIMAS.AACSSQ = FOFMAS.ABCSSQ

WHERE FOFMAS.ABCSCC <> 9

AND FOFMAS.ABCSSQ <> 99999

AND (FOFMAS.ABCSSQ <> 0 OR FOFMAS.ABFSQ# <> 0)

ORDER BY TRIM(CRIMAS.AAMATTR) ASC, FOFMAS.ABPYDT ASC , FOFMAS.ABADTM ASC