User-Defined Table Functions - It's All in the SQL!

DB2 is rich with built-in functions. There are simple scalar functions (LENGTH, SQRT, ABS), functions that deal with columns of data and grouping (AVG, SUM, OVER), row functions and specialized statistical functions (CORRELATION, STDDEV).

But it seems that with all of these functions in the DB2 library, there just are never enough! You can find all sorts of functions on the web that users have created to include as part of their standard SQL. And writing these functions aren't that difficult. Here is a single line example of the universal 'Hello World' function to concatenate to your favorite character string.

CREATE FUNCTION HELLO_WORLD() RETURNS VARCHAR(15) RETURN 'Hello World';

Using it in an SQL statement is the same as using any regular DB2 built-in function:

VALUES HELLO_WORLD();

 

1
-----------------------------
Hello World

Imagine the complex SQL you could now write with your unique "Hello World" function! Well, perhaps not that useful, but you get the idea of what you could potentially do with your own user-defined functions. 

Functions are classified into four types: scalar, column, row, and table.

Scalar Functions

Scalar functions are identical to the HELLO_WORLD example. You can pass parameters to your function and it returns a single value. The body of the function can contain a variety of code (written in your favorite language like Java, C, SQL/PL or PLSQL if you are using Oracle compatibility). The complexity is limited only by your patience in debugging your own code!

Column Functions

Column functions are the built-in functions that DB2 provides to do sums, averages, min, max, etc. within a column. Prior to DB2 Version 11.1, you couldn’t write your own column functions, but you could override the existing functions to work on new data types. DB2 provides the ability to create your own user-defined data types (MONEY instead of DEC(9,2)). This capability was popularly known as 'Object-relational' SQL. The thought was that everyone was going to extend the data types that DB2 could support and integrate that with some of the object-oriented languages.

You probably wondered where all that went. Ends up that none of the popular query tools or development languages really embraced user-defined data types in the database. And end-users wondered why queries like SALARY > 30000 didn't work when "30000" sure looked like a money data type.

The following SQL creates the new data type called MONEY and creates a table with the new type and inserts one row into it.

CREATE DISTINCT TYPE MONEY AS INT WITH COMPARISONS;
 
CREATE TABLE PAY (SALARY MONEY);
 
INSERT INTO PAY VALUES 40000, 50000, 60000, 70000;

 A SELECT statement works, as we would expect:

SELECT * FROM PAY;

 

SALARY
-----------
      40000
      50000
      60000
      70000

At this point a developer would think they are really clever because now whenever a salary field is used in any table, they could use MONEY as the data type and be assured that it will always be an integer column. 

Things fall apart quickly. A naïve end-user tries to select a count of salaries that are greater than 40000. 

SELECT COUNT(*) FROM PAY WHERE SALARY > 40000;

 

SQL0401N  The data types of the operands for the operation ">" are not
compatible or comparable.  SQLSTATE=42818

What? On the surface there appears to be nothing wrong with the 40,000. It's an integer, so why does it cause this error? Well, 40,000 is not a MONEY data type. DB2 sees it as an integer value so the comparison between a MONEY data type and an integer is not valid. The solution to this problem is to use MONEY as a CAST function. 

SELECT COUNT(*) FROM PAY WHERE SALARY > MONEY(40000);

And now we get the proper results when the re-run the query.

1
-----------
          3

This is why object-relational never caught on. Any end-user tool that you might use will have no clue that the SALARY column requires the use of a cast function. More than likely, other developers who decided to write an application against this table will probably encounter errors as well and spend too much time debugging this data type mismatch.

Now that I've warned you of the perils of user-defined types, I can now explain column functions. You can extend the existing functions to work against your user-defined data type. In our previous example, trying to get the average salary for the table doesn't work.

SELECT AVG(SALARY) FROM PAY;

 

SQL0440N  No authorized routine named "AVG" of type "FUNCTION" having
compatible arguments was found.  SQLSTATE=42884

Another setback I'm afraid. You have to overload the existing AVG function to take the MONEY data type as input. Here you create a column function by overloading the existing AVG function. 

CREATE FUNCTION AVG (MONEY)
       RETURNS DECFLOAT
       SOURCE  SYSIBM.AVG(INTEGER);

Re-issuing the SELECT command, we now get the proper average without the error message.

1
-------------------
              55000

 

User-Defined Aggregation Functions (New!)

DB2 Version 11.1 introduces User-defined Aggregation Functions. The new CREATE FUNCTION (aggregate interface) statement allows you to create your own aggregate functions.  An aggregate function returns a single value that is the result of an evaluation of a set of like values, such as those in a column within a set of rows. You use your choice of programming language and create four routines to cover the following steps during the execution of your function:

  • INITIATE
  • ACCUMULATE
  • MERGE
  • FINALIZE

This is feature will be delivered when DB2 Version 11.1 is made generally available. Look for future details on this exciting extension to user-defined functions!

Row Functions

A row function is used in conjunction with user-defined data types. User-defined data types are not restricted to single values. For instance, a developer could create a COMPENSATION data type that consists of three components: SALARY, BONUS, and COMMISSION.

CREATE TYPE COMPENSATION AS
  (
  SALARY INT,
  COMM   INT,
  BONUS  INT
  )
  MODE DB2SQL WITH FUNCTION ACCESS INSTANTIABLE;

A table can contain this structured data type as one of its columns. 

CREATE TABLE TEMP_EMPLOYEE (
  EMPNO INT,
  PAY COMPENSATION
);

So how do you insert a value into the COMPENSATION column? You need to use a row constructor in order to generate a value to insert into this column. Here is a sample constructor: 

COMPENSATION()..SALARY(50000)..COMM(5000)..BONUS(10000)

 Yuk! Yes, that is the syntax for inserting a structured data type into a table.  A row function can be defined to make this look a little less cryptic.

CREATE OR REPLACE FUNCTION COMP(V_SALARY INT, V_COMM INT, V_BONUS INT) 
  RETURNS COMPENSATION
BEGIN ATOMIC
  RETURN
    COMPENSATION()..SALARY(V_SALARY)..COMM(V_COMM)..BONUS(V_BONUS);
END;

 Now the COMP row function can be used to insert values into the employee table.

INSERT INTO TEMP_EMPLOYEE
  VALUES ('000010',COMP(50000,10000,5000));

I'll let you decide whether or not you think structured types are useful for your application development! Here is the SQL required to retrieve the data from the table. 

SELECT EMPNO, PAY..SALARY, PAY..COMM, PAY..BONUS FROM TEMP_EMPLOYEE;

 

Table Functions

The last type of function is called a table function. While user-defined functions are great for doing calculations and returning single values, table functions allow you return entire tables back to the user. 

I'll let that last point sink in. You can return tables back to your SQL.

Before we dive into the nuances of creating a table function, here is an example of how you can use a table function to make development prototyping easier. In a prior article on DB2 OLAP SQL we used a STOCK table in the examples.

Note: The SQL examples that are found in this article are available as a downloadable text file for you to use. Nothing is worse than having to type in a set of SQL and having it fail because of a missing comma or misspelled column name.  A link to the file is provided at the end of the article.

First we created the STOCKS table, which contained all of the trading information for a particular stock over a period of time. 

create table stocks (
  symbol      char(4),
  tradingdate date,
  closeprice  int
);

Then a clever form of recursive SQL was used to generate 10 days of data for the table. 

insert into stocks
   with temp1(nm,tx,s1) as
     (
     values ('IDUG', date('2015-01-01'),100)
     union all
     select 'IDUG', tx + 1 day, s1+1
       from temp1
       where tx < date('2015-01-10')
     )
   select * from temp1;

So now that we have the table created, we can issue any SQL we want against it. If we wanted to provide this table to other users for their own testing, we could either grant access to it for them, or send them this SQL to use. Of course, the users who want to play with this SQL would need to have table creation privileges in order to create these objects themselves. 

What happens if they don't have the necessary privileges? Here's where table functions may help.

We can encapsulate the creation of this table in a table function. Here is the table function definition (we'll get into the details later).

CREATE FUNCTION STOCKS()
  RETURNS TABLE(symbol char(4), tradingdate date, closeprice int)
RETURN
  with temp1(nm,tx,s1) as
    (
    values ('IDUG', date('2015-01-01'),100)
    union all
    select 'IDUG', tx + 1 day, s1+1
      from temp1
      where tx < date('2015-01-10')
    )
  select * from temp1;

Now the STOCKS table function can replace our STOCKS table and recursive SQL in any of the SQL we use. To use the STOCKS table function, the FROM clause must use the TABLE(…) function call as shown in the following SQL: 

SELECT * FROM TABLE( STOCKS() );

You can double-check that the values are identical by running the following SQL: 

SELECT * FROM TABLE( STOCKS() )
EXCEPT
SELECT * FROM STOCKS;

There should not be any records found since they are identical tables. 

Here is a 30-day moving average calculation of our stock table using a table function instead of a real table.

WITH V1(SYMBOL, TRADINGDATE, MOVINGAVG) AS
   (
   SELECT SYMBOL, TRADINGDATE,
     AVG(CLOSEPRICE)
       OVER (
         PARTITION BY SYMBOL
         ORDER BY TRADINGDATE
         ROWS BETWEEN 29 PRECEDING AND CURRENT ROW
       )
   FROM TABLE( STOCKS() ) -- Instead of FROM STOCKS
     WHERE SYMBOL = 'IDUG'
   )
 SELECT SYMBOL, TRADINGDATE, DEC(MOVINGAVG,5,2) FROM V1
 ORDER BY SYMBOL, TRADINGDATE;

Note that the only change that was made to the SQL was substituting the table function for the name of the original table. The column names are defined as part of the table function, so there is no need to rewrite any of the SQL in the calculations.

There are some additional optimizations that we can do with the table function including restricting the rows that are returned. One should also consider whether or not this approach is the most efficient way of generating the data. The downside to this example is that the table gets generated every time you issue this SQL. 

Creating Table Functions

 You've already seen the syntax for creating a table function in the previous examples. The CREATE FUNCTION syntax needs just a few lines to create a usable table function:

CREATE FUNCTION NAME(arguments)
  RETURNS TABLE(COlumn_1 DATA_TYPE, COLUMN_2 DATA_TYPE, …)
RETURN … SQL statement …

You define the function name and include any arguments you want to give to the body of the function. For instance, the code could be modified to include a start date and an end date range to be used as part of the SQL.

CREATE FUNCTION STOCKS(START_DATE DATE, END_DATE DATE) 

Arguments in any function definition can have default values. If you don't supply a default value, the SQL compiler will issue an error message if any of the arguments are missing. The body of the our original table function would change to use these new arguments: 

WITH TEMP1(NM,TX,S1) AS
    (
    VALUES ('IDUG', START_DATE, 100)
    UNION ALL
    SELECT 'IDUG', TX + 1 DAY, S1+1
      FROM TEMP1
      WHERE TX < END_DATE
    )

If we try to use the function without arguments, we would not get an error. In fact, we get the identical results from when we first created the function. 

SELECT * FROM TABLE( STOCKS() );

 

SYMBOL TRADINGDATE         CLOSEPRICE
------ ------------------- -----------
IDUG   2015-01-01-00.00.00         100
IDUG   2015-01-02-00.00.00         101
IDUG   2015-01-03-00.00.00         102

DB2 uses function "signatures" to determine which is the proper function to invoke based on the arguments that are passed onto the function. In this case we created a STOCKS table function earlier that had no arguments. Then we subsequently created another STOCKS table function with two arguments. DB2 determined that the original version of the function was the one to invoke since we did not provide any arguments. 

Trying to use the SELECT statement with one argument will produce an error:

SELECT * FROM TABLE( STOCKS(CURRENT DATE) );

 

SQL0440N  No authorized routine named "STOCKS" of type "FUNCTION" 
having compatible arguments was found.  SQLSTATE=42884

In order to use our second table function, two parameters need to be supplied: 

SELECT * FROM TABLE( STOCKS( '2015-01-01', CURRENT DATE));

If you want to use default values, you can provide them in the function definition. The DEFAULT clause is added to the end each argument definition.

CREATE FUNCTION STOCKS(
  START_DATE DATE DEFAULT '2015-01-01',
  END_DATE   DATE DEFAULT CURRENT DATE
)

You can supply one, or both arguments on the function definition and DB2 will figure it out. 

SELECT * FROM TABLE( STOCKS('2015-06-01') );

If you can't remember the position of the arguments in the function, all you have to do is use the special => operator and the name of the argument in the function call. The two function calls below are equivalent: 

SELECT * FROM TABLE( 
  STOCKS(END_DATE=>CURRENT DATE, START_DATE=>'2015-03-01')
);
SELECT * FROM TABLE(
  STOCKS('2015-01-01',CURRENT DATE)
);

You can also leave out an argument and the default value will be used. These two SQL statements are equivalent: 

SELECT * FROM TABLE( STOCKS('2015-01-01') );
SELECT * FROM TABLE( STOCKS (END_DATE=>CURRENT DATE) );

There will be those reading this article and wonder what would happen if I supplied no arguments? What will DB2 do then? Luckily I tested it to make sure, but DB2 will use our first function definition since it best "matches" our syntax. If you really want to use the second version of our STOCKS function, you must supply two arguments to the function. If you want to use the default values, the keyword DEFAULT must be used for the parameter: 

SELECT * FROM TABLE( STOCKS(DEFAULT, DEFAULT) );

Advanced Functionality

So far we've only returned information from existing tables via a RETURN(SELECT…) statement. We may want to get a little more sophisticated in what we return to the user. Often we want to combine (join!) data from various tables and perhaps do some calculations on the data to create a derived column. 

If you are SQL-inclined (as opposed to a naïve user) you may be able to do that with some clever case statements and other SQL gymnastics. Clever SQL, no matter how well intended, is tough to write and to debug!

Another approach is to use the expressive power of SQL PL to do some of the work for you. This next example assumes you have the SAMPLE database installed on your system. There is an employee table that contains various bits of information about employees as well as their compensation. Employees are given a bonus every year based on a simple formula:

  • Calculate the average employee salary per department, but exclude any managers from the calculation
  • Take the average salary per department and take a percentage of that and use it as the bonus for the entire department (BUCKET)
  • Each employee is given a portion of this bonus based on the following calculation:

    • Sum all of the salaries in the department (TOTAL)
    • Divide the employee's salary by this sum
    • Multiply the result by the BUCKET amount to determine the bonus

For example, say a department has 5 employees including the manager. The total salary of these employees is $200,000 (20K, 40K, 60K and 80K). The average salary would be 50K. Take 20% of this (10K) and distribute among the employees based on their salary. So the employee with a salary of 20K would get a bonus of (20/200)*10K = 1K bonus. The other employees would get 2K, 3K, and 4K respectively and nothing for the manager. 

Could you do this in an application or one SQL statement? Sure you could (I'll leave this as an exercise for those of you who really like playing with SQL [Hint: use WITH statements]. Our code needs to do a couple of things. We start by determining what we want to return from our table function. At a minimum we want the employee number and their bonus. Note that we want all of the employees returned, even if they get no bonus because of their status. We also need to supply the percentage that we want used in calculating the bonus bucket.

The first step is to define the table function with the columns that we want the function to return:

CREATE OR REPLACE FUNCTION EMPLOYEE_BONUS(BONUS_PERCENT DEC(5,2))
  RETURNS TABLE(EMPNO CHAR(6), BONUS DECIMAL(9,2))

In order to figure out the actual bonus amounts, we first need to compute the sum of the salaries, the average salary, and finally the bonus bucket available. Note that the calculations exclude any employees that are managers (or the president). 

SELECT WORKDEPT, SUM(SALARY), AVG(SALARY)*(BONUS_PERCENT/100.0)
  FROM EMPLOYEE WHERE JOB NOT IN ('MANAGER','PRES')
GROUP BY WORKDEPT;

Here are the results of this query is (assuming a 20 percent bonus value): 

WORKDEPT             2            3
--------- ------------ ------------
A00             201500        10075
C01             210640        14043
D11             574370        11487
D21             262510         8750
E11             227390         7580
E21             196370         7855

These results could be placed into a temporary table and then joined with the employee table to get the bonus results. Rather than creating a temporary table, we could use the WITH clause and generate a result set within the SQL statement and use that to generate the list of employees. 

WITH DEPT_BONUS(WORKDEPT, TOTAL, BUCKET) AS
  (
  SELECT WORKDEPT, SUM(SALARY), AVG(SALARY)*(BONUS_PERCENT/100.0)
    FROM EMPLOYEE
  WHERE JOB NOT IN ('MANAGER','PRES')
  GROUP BY WORKDEPT
  )
SELECT  E.EMPNO, E.JOB, E.SALARY, B.TOTAL, B.BUCKET
  FROM  EMPLOYEE E, DEPT_BONUS B
  WHERE E.WORKDEPT = B.WORKDEPT;

Executing this SQL will get us the employee, their job title, current salary, and the department total (TOTAL) and bonus amount (BUCKET). Some sample results are found below. Note that we are including all of the employees in the list, even if they are not eligible for the bonus. 

EMPNO   JOB                  3            4            5
------- --------- ------------ ------------ ------------
000010  PRES            152750       201500        10075
000030  MANAGER          98250       210640        14043
000060  MANAGER          72250       574370        11487

The SQL can be wrapped in a DECLARE CURSOR statement that allows the code to open the cursor and iterate through each employee record one at a time. 

DECLARE EMP_CURSOR CURSOR FOR
    WITH … SQL …

We need to add an event handler for when we run out of records, and the logic for calculating the bonus. The event handler is used to tell the program that we have no more records in our select list. 

DECLARE EOF INT DEFAULT 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET EOF = 1;

The body of the program would loop through the records until we run out of rows.

OPEN EMP_CURSOR;
FETCH FROM EMP_CURSOR INTO EMP;
WHILE EOF = 0 DO
  … logic …
  FETCH FROM EMP_CURSOR INTO EMP;
END WHILE;
CLOSE EMP_CURSOR;

The table function will take advantage of a row data type to make data movement easier between the SQL results and the variables in the program.

DECLARE TYPE EMPLOYEE_ROW AS
     ROW(
          EMPNO    CHAR(6),
          JOB      CHAR(8),
          SALARY   DECIMAL(9,2),
          TOTAL    DECIMAL(9,2),
          BUCKET   DECIMAL(9,2)
        );        
DECLARE EMP EMPLOYEE_ROW; 

The EMPLOYEE_ROW variable can now be used in a FETCH statement to retrieve all of column data with one variable name rather than 5 separate variables:

FETCH FROM EMP_CURSOR INTO EMP;

By using the EMP variable, the individual columns can be referenced using the syntax EMP.<column>. Using EMP.EMPNO would access the EMPNO value.

We also need to define a variable for our bonus calculation.

DECLARE V_BONUS DEC(9,2);

The bonus calculation will use the logic described earlier. If the employee is a manager or the president, they will not get any bonus. Otherwise, the employee will get a bonus based on their current salary and the total bonus money available.

IF EMP.JOB IN ('MANAGER','PRES') THEN
   SET V_BONUS = 0;
ELSE
   SET V_BONUS = ( EMP.SALARY / EMP.TOTAL ) * EMP.BUCKET;
END IF;

At this point you are probably wondering how you are returning a table when the program is calculating bonuses on an individual basis? The first example in this article used the return statement with a SELECT statement to generate the rows that will be returned.  Returning an individual row at a time back to an application (or SQL statement), requires the used of the PIPE function:

PIPE(EMP.EMPNO, V_BONUS);

The PIPE function indicates to DB2 that this row should be returned back to the calling function, application, or SQL statement. The program pauses at this point as the calling application processes the row. If another row is requested, the program logic will continue after the PIPE function. If the calling application does not require any further rows, it can just cancel the request. The advantage of this approach is that the table function does not need to process all of the rows if the calling application does not require it.  The final code for the table function is found below.

CREATE OR REPLACE FUNCTION EMPLOYEE_BONUS(BONUS_PERCENT DEC(5,2))
  RETURNS TABLE(EMPNO CHAR(6), BONUS DECIMAL(9,2))
BEGIN
  DECLARE TYPE EMPLOYEE_ROW AS
     ROW(
          EMPNO    CHAR(6),
          JOB      CHAR(8),
          SALARY   DECIMAL(9,2),
          TOTAL    DECIMAL(9,2),
          BUCKET   DECIMAL(9,2)
        );
       
  DECLARE EMP      EMPLOYEE_ROW;
  DECLARE V_BONUS  DEC(9,2);
  DECLARE EOF      INT DEFAULT 0;
 
  DECLARE EMP_CURSOR CURSOR FOR
    WITH DEPT_BONUS(WORKDEPT, TOTAL, BUCKET) AS
      (
      SELECT WORKDEPT, SUM(SALARY), AVG(SALARY)*(BONUS_PERCENT/100.0)
      FROM EMPLOYEE WHERE JOB NOT IN ('MANAGER','PRES')
      GROUP BY WORKDEPT
      )
    SELECT  E.EMPNO, E.JOB, E.SALARY, B.TOTAL, B.BUCKET
    FROM    EMPLOYEE E, DEPT_BONUS B
      WHERE E.WORKDEPT = B.WORKDEPT;
 
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET EOF = 1;
 
  OPEN EMP_CURSOR;
  FETCH FROM EMP_CURSOR INTO EMP;
  WHILE EOF = 0 DO
    IF EMP.JOB IN ('MANAGER','PRES') THEN
       SET V_BONUS = 0;
    ELSE
       SET V_BONUS = ( EMP.SALARY / EMP.TOTAL ) * EMP.BUCKET;
    END IF;
    PIPE(EMP.EMPNO, V_BONUS);
    FETCH FROM EMP_CURSOR INTO EMP;
  END WHILE;
  CLOSE EMP_CURSOR;
  RETURN;
END

To run this table function, you would use the standard table function-calling format.

SELECT * FROM TABLE( EMPLOYEE_BONUS(20) )
  WHERE BONUS BETWEEN 2000 AND 5000;

 

EMPNO          BONUS
------- ------------
000110       3325.00
000120       2462.50
000130       4920.00
000140       4561.33
200010       2325.00
200140       4561.33 

If you want to hide the complexity of the table function from users, you could always create a view than encapsulates the function. 

CREATE VIEW BONUS_EMPLOYEE AS 
  SELECT * FROM TABLE(EMPLOYEE_BONUS(20));

Now you have the ability to use the name BONUS_EMPLOYEE as a table name in SQL rather than using the TABLE() function. 

SELECT * FROM BONUS_EMPLOYEE
  WHERE BONUS BETWEEN 2000 and 5000;

 The only drawback from this is that the function parameter is set to 20 percent. You must recreate the view in order to change the percentage that is used for the bonus calculation. 

CREATE OR REPLACE VIEW BONUS_EMPLOYEE AS 
  SELECT * FROM TABLE(EMPLOYEE_BONUS(50));

Summary

Table functions can be a great addition to your SQL toolkit.  Table functions can be used to generate result sets that can be used in place of physical tables. These functions can have parameters as input to modify their behavior and change the results on a per-call basis. These types of capabilities are not available in standard views so table functions can provide much more sophisticated capabilities. 

Sample SQL Link

Author

George Baklarz, B. Math, M. Sc, Ph.D. Eng. 

George has spent 30 years in IBM working on various aspects of database technology. From 1987 to 1991 he worked on SQL/DS as part of the product planning department, system test team, performance team, and application development group. In 1991, Baklarz was part of the team that helped move the OS/2 ES database to Toronto to become part of the DB2 family of products. Since this time he has worked on vendor enablement, competitive analysis, product marketing, product planning, and technical sales support. He currently manages the Worldwide Core Database Technical Sales department.

 

Recent Stories
Selecting Appropriate Data Architecture Technologies

SQL Tuning for Developers – Part 2 Explain Output

Some of my favorite tunings tips and tricks for DB2 by Joe Geller