Blogs

DB2 User-Defined Functions – Extending Database Capabilities

By Daniel Luksetich posted Nov 25, 2015 10:26 AM

  

DB2 comes with a wide variety of built-in functions that enable some incredible functionality within the context of SQL statements. These functions allow for developers to very quickly incorporate application functionality inside SQL statements for fast deployment of features and reports. This is extremely important when functionality is required quickly and centrally via data service calls. Since a service call is a single statement, getting the most work into that one statement is important. Functions provide additional complexity that can satisfy multiple requirements in a single database call.

Even though DB2 has a very extensive list of built-in functions these functions may not be enough to satisfy a given requirement. Perhaps also, there is a legacy process or functionality that is required as part of a service that is not accessible via SQL. In these situations all is not lost. Anyone can incorporate any needed functionality into the database via user-defined functions (UDFs). UDFs are functions that users can write themselves and install into DB2, thus extending the built-in capabilities of DB2. What are the limits to the functionality one can incorporate via UDFs? None!

Types of Functions Available

Functions are defined by their signature. The signature consists of a schema name, function name, and zero or more parameters of distinct data types. UDFs can also be “overloaded”, that is you can have the same UDF name with a variety in the number and data types of parameters. This is a form of polymorphism typical in object oriented programming. There are four major types of functions built into DB2 for z/OS and LUW:

  • Scalar: Zero or more parameters as input and returning a single value as output. An example would be the SUBSTR function. SUBSTR(‘BOBRADY’,1,3)=’BOB’.
  • Aggregate (or Column): Zero or more parameters as input and returning a single value as output. An aggregate function operates against a set of data. That is, an aggregate function operates across multiple rows of a table. An example would be COUNT(*), which returns a count of the number of rows in a result set.
  • Row: Zero of more parameters as input and returning a row of data. An example of a row function would be the UNPACK function in DB2 for z/OS.
  • Table: Zero of more parameters as input and returning a table as the result. Table functions can be accessed in a FROM clause and treated just like a table, including joining to other tables. One example is the XMLTABLE function which accepts an XML document and XPATH expression as input, and returns a relational table.

UDFs can be coded as any of these types of functions. In addition, there can be several types of UDFs.

  • Sourced UDFs: These are UDFs that are based upon existing built-in functions or other UDFs. A common use of sourced UDFs is with user-defined distinct data types.

CREATE TYPE DOLLAR AS DEC(13,2);

CREATE FUNCTION AVG (DOLLAR) RETURNS DOLLAR
SOURCE SYSIBM.AVG (DECIMAL(13,2));

  • SQL Scalar UDFs: These are scalar functions that can be either inline SQL (built on a single SQL expression), or non-inline SQL (also containing SQL PL).
  • External Scalar UDFs: These are UDFs that are actually written in an external programming language, such as COBOL, C, PL/I, Java, etc.
  • SQL Table and Row UDFs: These are table and row functions that can be written in SQL, including SQL PL (only on DB2 for LUW for now), and can be accessed as if they are a table in normal SQL statements. Row UDFs are only available on DB2 for LUW, but in DB2 for z/OS a single row table function can be substituted.
  • External Table and Row UDFs: These are table and row functions that are written in an external programming language. Again, row UDFs are only available on DB2 for LUW, but in DB2 for z/OS a single row table function can be substituted.

Dan.jpg

The diversity of UDFs really opens many doors to extending the capabilities of DB2 in many ways! There really is no limit to what can be done! UDFs can also invoke other UDFs and stored procedures, extending capabilities even further. This article only discusses a few of these possibilities.

Scalar UDFs

Scalar UDFs can have no limit, except that they can only return a single value. Given that the result is a single value, then the invocation of a scalar UDF can be placed virtually anywhere in an SQL statement where a single value can be placed. This includes a SELECT list, a predicate, a SET clause, inside a CASE expression, and much more! Scalar UDFs can vary from the most simple to the most complicated.

One simple example is an AGE function, for which there is no built-in function available in DB2. If you want to calculate the age of someone from the DB2 sample EMPLOYEE table you could code a statement such as this:

SELECT E.EMPNO, E.LASTNAME, (YEAR(CURRENT_DATE) - YEAR(E.BIRTHDATE))
FROM   EMP E
WHERE  E.EMPNO = '000010'

Since a single SQL expression is used to calculate AGE it can be incorporated into an inline SQL scalar function.

CREATE FUNCTION DANL.AGE (BDATE DATE)
RETURNS SMALLINT
LANGUAGE SQL
DETERMINISTIC
CONTAINS SQL
RETURN (YEAR(CURRENT_DATE) - YEAR(BDATE))

Now the AGE function is available to any user or developer.

SELECT E.EMPNO, E.LASTNAME, DANL.AGE(E.BIRTHDATE)
FROM   EMP E
WHERE  E.EMPNO = '000010'

The advantage to this is ease of coding (especially for complicated expressions), centralization of business rules, and quality control of the business rules. This can lead to faster and more accurate application development. This concept is true for any UDF!

If functionality of the UDF is more complicated than a simple expression then perhaps a non-inline SQL scalar function is in order. This type of UDF can contain SQL PL and SQL statements to any level of complexity. These are coded like Native SQL PL stored procedures but return a single value instead of parameters or result sets.  Here is a simple example of a non-inline SQL scalar UDF that uses SQL PL to concatenate two strings together regardless of whether or not they contain NULL values.

CREATE FUNCTION DANL.CCAT
(STRING1 VARCHAR(4000), STRING2 VARCHAR(4000))
RETURNS VARCHAR(4000)
LANGUAGE SQL
DETERMINISTIC
CONTAINS SQL
CALLED ON NULL INPUT
BEGIN
   IF STRING1 IS NULL THEN RETURN STRING2; END IF;
   IF STRING2 IS NULL THEN RETURN STRING1; END IF;
   RETURN STRING1 CONCAT STRING2;
END

This functionality can now be used virtually anywhere in an SQL statement. Remember, these non-inline SQL scalar functions can include SQL statements and calls to other routines!

External scalar UDFs are important when access to external processes or external resources is required within the context of an SQL statement. This can be quite significant especially when trying to incorporate legacy processing into SQL. Imagine converting a legacy program to SQL access within a few hours. It is possible! In one example, access to a legacy soundex function (different from built-in DB2 soundex function) was required within SQL in a new application. The problem was that the legacy soundex function was written in IBM assembler language and the source code was lost! A quick solution was to code a COBOL program using the proper parameters for a UDF and to have the COBOL program call the assembler program to get the value. Within 8 hours the legacy soundex functionality was now available within DB2!

SELECT LASTNAME, DANL.SOUNDEX(LASTNAME)
FROM   EMP
WHERE  EMPNO = '000010';

Dan2.jpg

Table UDFs

A table UDF is a function that actually returns a table to an SQL statement. These UDFs can be SQL based, SQL PL base, or even written in an external language. The simplest form of a table UDF is the SQL table UDF, otherwise known as parameterized views. These UDFs accept zero or more parameters as input and actually return a table to DB2. The table function definition includes the column names and data types of the output table.

CREATE FUNCTION DANL.DEPTDTLS (DEPTNO CHAR(06))
RETURNS TABLE (AVGSAL DECIMAL(9,2), EMPCNT INT)
LANGUAGE SQL
NOT DETERMINISTIC
READS SQL DATA
RETURN
SELECT AVG(SALARY), COUNT(*)
FROM   EMP
WHERE  WORKDEPT = DEPTNO; 

Once defined, a table UDF can then be used in a FROM clause along with the keyword TABLE.

SELECT E.EMPNO, E.LASTNAME, DT.AVGSAL
FROM   EMP E
INNER JOIN
       TABLE(DANL.DEPTDTLS(E.WORKDEPT)) AS DT
ON 1=1 

Table UDFs can be used much like any other table reference in a FROM clause. In addition to SQL table UDFs it is also possible to include SQL PL in a table UDF, but only on DB2 for LUW. Expect SQL PL in a table function in an upcoming release of DB2 for z/OS.

CREATE FUNCTION DEPTEMPLOYEES (DEPTNO CHAR(3))
RETURNS TABLE (EMPNO CHAR(6),
LASTNAME VARCHAR(15),
FIRSTNAME VARCHAR(12))
LANGUAGE SQL
MODIFIES SQL DATA
NO EXTERNAL ACTION
DETERMINISTIC
BEGIN ATOMIC
   INSERT INTO AUDIT
   VALUES (USER, 'Table: EMPLOYEE Prd: DEPTNO = ' CONCAT DEPTNO); 
   RETURN
   SELECT EMPNO, LASTNAME, FIRSTNME
   FROM EMPLOYEE
   WHERE EMPLOYEE.WORKDEPT = DEPTEMPLOYEES.DEPTNO
END

It is also possible to create external table UDFs using any of the languages that are available for stored procedures. In this way a program can be written to access an external resource and return that data to DB2 in the form of a table. Imagine reading a text file, returning that file to DB2 in the form of a table, and then joining that data to other DB2 tables. All of this can be done in a single SQL statement!

UDF Performance

There are virtually no performance benefits to UDFs. Even inline SQL scalar UDFs have more overhead then executing the single expression on its own due to casting of the input variables. UDFs can be executed many times when placed in a SELECT list, and external UDFs execute in an external process or workload managed address space and thus can be very expensive if executed many times during the processing of an SQL statement. UDF overhead does not show up in EXPLAIN output and so the only way to measure or predict UDF performance is by performing benchmark tests. In spite of the performance implications, UDFs can make the impossible possible as well as save days, weeks, or even months of programming time. Therefore, you must understand the cost/benefit of incorporating UDFs into a design.

An SQL table function can be a performance benefit if used as a parameterized view whereas the view without the parameter will be materialized in referencing SQL statements even though few rows are processed. An SQL table function will also force an access path of nested loop for joins to the table function. Be careful with this, and as always run benchmark tests to make sure it really is a performance improvement.

The DETERMINISTIC setting of a UDF will determine whether or not a UDF will cause the materialization of nested table expressions. This can be either a positive or negative impact on performance. By definition, a deterministic function returns the same result given the same input. So, SUBSTR(‘BOB’,1,1) will always return “B” and SUBSTR(LASTNAME, 1, 1) will always return the first byte of the LASTNAME column. Therefore, the SUBSTR function is deterministic. In contrast the RAND() function will always return a different result, and so it is a classic example of a non-deterministic function. It should be noted that NOT DETERMINISTIC is the default setting for UDFs, and even the most simple of inline SQL scalar functions will result in materialization of nested table expressions if this setting is not changed. Use this setting carefully and test the impact on nested table expressions by checking EXPLAIN output for materialization and by running benchmark tests.

Summary

UDFs are an incredible way to extend the functionality of the database, incorporate programming into DB2, and access external resources or processes from within the context of an SQL statement. The only limited to the functionality of DB2 at that point is imagination! UDFs should be used with an understanding of the balance between functionality, speed of delivery, and performance.

Daniel Luksetich
DanL Database Consulting
danl@db2expert.com

0 comments
10 views