Expression based indices

John Unger

Expression based indices

I am exploring the possibility of an expression-based index to improve query response time. What I would like to do is index on the maximum setup-date of a record, and have tried the following:

CREATE INDEX "TESTDB"."X1E_SMASTER"

      ON "TESTDB"."TMASTER"

      ("ID_NO"          ASC,

      MAX("SETUP_DATE"),

      "READY_STATE"   ASC)

      NOT CLUSTER

      USING STOGROUP GTTEST01

      PRIQTY 44640

      SECQTY 2880

      BUFFERPOOL BP1     

      PIECESIZE 2097152 K;

However, when I try to execute this (in Data Studio) I receive this message:

"AN AGGREGATE FUNCTION IS NOT VALID IN THE CONTEXT IN WHICH IT WAS INVOKED. SQLCODE=-120,..."

I know MAX is considered to be an aggregate function, although I was thinking of it as a "column function." I was hoping it would be allowed in this context .

The table is used by hundreds of applications, and in most of them to get the most recent record the WHERE clause contains a sub-SELECT requesting the MAX SETUP-DATE for that ID number and the state of readiness. Is there an alternate path to my goal of eliminating the sub-SELECT and improving response time?
 Thank you.

John Wm. Unger
DBA III, TDCJ-ID/ITD
936-437-1979
[login to unmask email]

 

Edited By:
John Unger[Organization Members] @ Mar 06, 2017 - 04:25 PM (America/Central)

Sam Baugh

Expression based indices
(in response to John Unger)
select setup_date from testdb.tmaster where id = ? order by setup_date desc
fetch first row only

On Mon, Mar 6, 2017 at 4:17 PM, John Unger <[login to unmask email]> wrote:

> I am exploring the possibility of an expression-based index to improve
> query response time. What I would like to do is index on the maximum
> setup-date of a record, and have tried the following:
>
> CREATE INDEX "TESTDB"."X1E_SMASTER"
>
> ON "TESTDB"."TMASTER"
>
> ("ID_NO" ASC,
>
> MAX("SETUP_DATE"),
>
> "READY_STATE" ASC)
>
> NOT CLUSTER
>
> USING STOGROUP GTTEST01
>
> PRIQTY 44640
>
> SECQTY 2880
>
> BUFFERPOOL BP1
>
> PIECESIZE 2097152 K;
>
> However, when I try to execute this (in Data Studio) I receive this
> message:
>
> "AN AGGREGATE FUNCTION IS NOT VALID IN THE CONTEXT IN WHICH IT WAS
> INVOKED. SQLCODE=-120,..."
>
> Is MAX considered an aggregate function, similar to SUM, COUNT, etc?
>
> The table is used by hundreds of applications, and in most of them to get
> the most recent record the WHERE clause contains a sub-SELECT requesting
> the MAX SETUP-DATE for that ID number and the state of readiness. Is there
> an alternate path to my goal of eliminating the sub-SELECT and improving
> response time?
> Thank you.
> John Wm. Unger
> DBA III, TDCJ-ID/ITD
> 936-437-1979
> [login to unmask email]
>
>
>
> -----End Original Message-----
>

Nadir Doctor

Expression based indices
(in response to Sam Baugh)
Hi John,

Yes, the max function is an aggregate like others - for a set, returns a
single value.

Sam's query should do the trick - DB2 on distributed platforms allows
reverse scans. With this functionality, one index can be used to determine
both the minimum and maximum values quickly.


Best Regards,
Nadir



On Mon, Mar 6, 2017 at 4:29 PM, Sam Baugh <[login to unmask email]> wrote:

> select setup_date from testdb.tmaster where id = ? order by setup_date
> desc fetch first row only
>
> On Mon, Mar 6, 2017 at 4:17 PM, John Unger <[login to unmask email]> wrote:
>
>> I am exploring the possibility of an expression-based index to improve
>> query response time. What I would like to do is index on the maximum
>> setup-date of a record, and have tried the following:
>>
>> CREATE INDEX "TESTDB"."X1E_SMASTER"
>>
>> ON "TESTDB"."TMASTER"
>>
>> ("ID_NO" ASC,
>>
>> MAX("SETUP_DATE"),
>>
>> "READY_STATE" ASC)
>>
>> NOT CLUSTER
>>
>> USING STOGROUP GTTEST01
>>
>> PRIQTY 44640
>>
>> SECQTY 2880
>>
>> BUFFERPOOL BP1
>>
>> PIECESIZE 2097152 K;
>>
>> However, when I try to execute this (in Data Studio) I receive this
>> message:
>>
>> "AN AGGREGATE FUNCTION IS NOT VALID IN THE CONTEXT IN WHICH IT WAS
>> INVOKED. SQLCODE=-120,..."
>>
>> Is MAX considered an aggregate function, similar to SUM, COUNT, etc?
>>
>> The table is used by hundreds of applications, and in most of them to get
>> the most recent record the WHERE clause contains a sub-SELECT requesting
>> the MAX SETUP-DATE for that ID number and the state of readiness. Is there
>> an alternate path to my goal of eliminating the sub-SELECT and improving
>> response time?
>> Thank you.
>> John Wm. Unger
>> DBA III, TDCJ-ID/ITD
>> 936-437-1979 <(936)%20437-1979>
>> [login to unmask email]
>>
>>
>>
>> -----End Original Message-----
>

John Unger

RE: Expression based indices
(in response to Nadir Doctor)

I was afraid of that.

Here is what a standard WHERE clause looks like on this table:

FROM TESTDB.TMASTER A                  

WHERE A.ID_NO > '00000500'

  AND

      A.SETUP_DATE =            

      (SELECT MAX(B.SETUP_DATE)

             FROM TESTDB.TMASTER B                          

            WHERE B.ID_NO = A.ID_NO 

              AND B.READY_STATE IN (' ', 'N')    

          )                                         

    AND   (                                        

                 (A.ID_STATUS IN ('A', 'X'))        

              OR                                          

                (                                        

                 (A.RD_STATUS LIKE 'T%')      

                 AND                                     

                 (A.RD_DATE >= :LAST-NINETY-DAYS)  

                )

             )                                

  FOR FETCH ONLY          

This table is frequently read through to get all the active ID_NOs, and some of our EXPLAINS are looking very ugly. We are working to clean this old thing up and normalize it, but I was hoping that an index change would serve as a stopgap until we can get other problems fixed.

Thanks for your input!

John Wm. Unger

DBA III, TDCJ-ID/ITD
936-437-1979
[login to unmask email]

Joe Geller

RE: Expression based indices
(in response to John Unger)

John,

Are the active IDs based on READY_STATE or ID_STATUS?  How many rows are in the desired READY_STATE?  If not many, then you could have an index on READY_STATE, ID_NO, SETUP_DATE DESC.   This will only perform well if there are not many rows at a time with READY_STATE IN(' ','N").   The first row fetched for each ID will have the max SET_UP_DATE.  You would still need the sub_select so that DB2 will only take the first one.

Joe

In Reply to John Unger:

I was afraid of that.

Here is what a standard WHERE clause looks like on this table:

FROM TESTDB.TMASTER A                  

WHERE A.ID_NO > '00000500'

  AND

      A.SETUP_DATE =            

      (SELECT MAX(B.SETUP_DATE)

             FROM TESTDB.TMASTER B                          

            WHERE B.ID_NO = A.ID_NO 

              AND B.READY_STATE IN (' ', 'N')    

          )                                         

    AND   (                                        

                 (A.ID_STATUS IN ('A', 'X'))        

              OR                                          

                (                                        

                 (A.RD_STATUS LIKE 'T%')      

                 AND                                     

                 (A.RD_DATE >= :LAST-NINETY-DAYS)  

                )

             )                                

  FOR FETCH ONLY          

This table is frequently read through to get all the active ID_NOs, and some of our EXPLAINS are looking very ugly. We are working to clean this old thing up and normalize it, but I was hoping that an index change would serve as a stopgap until we can get other problems fixed.

Thanks for your input!

John Wm. Unger

DBA III, TDCJ-ID/ITD
936-437-1979
[login to unmask email]

John Unger

RE: Expression based indices
(in response to Joe Geller)

Joe,
Right now there are >2M records in this table. The table was created in the mid-90's in a rush to get stuff out of KSDS files into DB2 - without time allowed for normalization. It has gone downhill from there, but we have a DBA shop now and management approval to improve it. I say that to explain some of what follows.

It is clustered by NAME into 10 partitions. The unique index is on one of two (yes, two) ID numbers + SETUP_DATE (desc) and READY_STATE (asc). Unfortunately, because of business rules, this particular application (and most applications) use the other ID number, which also has an index with it, SETUP_DTE (desc) and READY_STATE (asc).

The performance, over time, has gone to perdition in a handbasket, and we are trying to stop the bleeding while we get the major issues addressed like the space and leaf issues for the tablespace, indexes, the nature of the data and eventually the table definition itself.

Some of the fixes we are implementing to the tablespace and indexspaces ought to help us lower the cpu costs in the interim. I had just been told about the expression based indices and was hoping that would serve as a bandage while we fix the other problems.

Thank you!
John Wm. Unger
DBA III, TDCJ-ID/ITD
936-437-1979
[login to unmask email]

Peter Vanroose

RE: Expression based indices
(in response to John Unger)

John,

Two reasons why this is not accepted by DB2:
-   column functions cannot be used in an index on expression, only scalar functions
-   an index on expression can only have a single column / field

In Reply to John Unger:

[...] CREATE INDEX "TESTDB"."X1E_SMASTER"  ON "TESTDB"."TMASTER"

      ("ID_NO"          ASC, MAX("SETUP_DATE"), "READY_STATE"   ASC)  [...]

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/

Joe Geller

RE: Expression based indices
(in response to Peter Vanroose)

Actually, an index on expression can have multiple columns and expressions. But it can't be an aggregate function.

Joe

In Reply to Peter Vanroose:

John,

Two reasons why this is not accepted by DB2:
-   column functions cannot be used in an index on expression, only scalar functions
-   an index on expression can only have a single column / field

In Reply to John Unger:

[...] CREATE INDEX "TESTDB"."X1E_SMASTER"  ON "TESTDB"."TMASTER"

      ("ID_NO"          ASC, MAX("SETUP_DATE"), "READY_STATE"   ASC)  [...]

--      Peter Vanroose
        ABIS Training & Consulting,
        Leuven, Belgium.
        http://www.abis.be/