Blogs

Lateral join

By Emil Kotrc posted Nov 28, 2018 01:21 AM

  

After a discussion with a friend, I came across an interesting article about the modern SQL in PostgreSQL (sorry the article is in czech). What raised my attention was a mention about the Lateral join. I work mostly on Db2 for z/OS and haven't heard the term in this context. What about you?

In short - lateral join or lateral correlation is simply a reference in a nested table expression to a column of a table identified at the same level in the FROM clause. In other words, it allows you to refer the columns of the left table in a nested table expression on the right side of a join. Normally, in the nested table expression, you can refer only the tables from a higher level in the hierarchy.

Let's take a look at a simple use case on the well known IBM sample database shipped with Db2. In this article we will use the EMP (employee) and DEPT (department) tables. Let our task be as simple as creating a report showing all departments and listing three employees with the highest salary in each department.

As usual, there are several possible solutions and you may show your solution in the comments below. We will mention just two here

  1. using an OLAP expression, and
  2. using the lateral join.

OLAP expression

The OLAP (Online Analytical Processing) expressions also known as window functions are very powerful and useful. They were introduced in Db2 10 for z/OS (GA in 2010).

The solution using the window function can look like this:

WITH x AS ( 
SELECT * FROM (
SELECT workdept, lastname, salary,
RANK() OVER (PARTITION BY workdept ORDER BY
salary DESC) AS r FROM emp) AS s
WHERE s.r <= 3)
SELECT workdept, deptname, lastname, salary FROM x, dept
WHERE x.workdept = deptno ORDER BY deptno, salary DESC, lastname;

which produces the following report on my test system:

WORKDEPT DEPTNAME                      LASTNAME   SALARY
A00 SPIFFY COMPUTER SERVICE DIV. HAAS 52750.00
A00 SPIFFY COMPUTER SERVICE DIV. HEMMINGER 46500.00
A00 SPIFFY COMPUTER SERVICE DIV. LUCCHESI 46500.00
B01 PLANNING THOMPSON 41250.00
C01 INFORMATION CENTER KWAN 38250.00
C01 INFORMATION CENTER NATZ 28420.00
C01 INFORMATION CENTER NICHOLLS 28420.00
D11 MANUFACTURING SYSTEMS STERN 32250.00
D11 MANUFACTURING SYSTEMS JOHN 29840.00
D11 MANUFACTURING SYSTEMS LUTZ 29840.00
D21 ADMINISTRATION SYSTEMS PULASKI 36170.00
D21 ADMINISTRATION SYSTEMS MARINO 28760.00
D21 ADMINISTRATION SYSTEMS MONTEVERDE 28760.00
E01 SUPPORT SERVICES GEYER 40175.00
E11 OPERATIONS HENDERSON 29750.00
E11 OPERATIONS SCHNEIDER 26250.00
E11 OPERATIONS SCHWARTZ 26250.00
E21 SOFTWARE SUPPORT SPENSER 26150.00
E21 SOFTWARE SUPPORT LEE 25370.00
E21 SOFTWARE SUPPORT WONG 25370.00

The basic idea here is to use the RANK() window function and select just the employees with the highest rank (less or equal to 3 in this case). To get the department names we need to join the employee table with the department table.

Lateral join

The above solution works, but what do we want to achieve really? Maybe, using the window function is an overhead and we could use a more simple solution?

Basically, we need to query the employee table and fetch just the 3 employees with the highest salary. If that is the only task, we can do it for a single department as easily as:

SELECT lastname, salary FROM emp e
WHERE e.workdept = 'department' ORDER BY salary DESC FETCH FIRST 3 ROWS ONLY;

where department stands for a given specific department. However, we would like to have it for every department and report also the department name. To make it work, we need to join the department table with the aforementioned table expression and refer to the department in the join. What if we try the naive solution expressed here:

SELECT workdept, deptname, lastname, salary FROM dept d,                                                 
(SELECT lastname, salary FROM emp e WHERE e.workdept = d.deptno
 ORDER BY salary DESC
 FETCH FIRST 3 ROWS ONLY);                    
ORDER BY deptno, salary DESC, lastname;

Well, the execution in Db2 for z/OS fails with:

DSNT408I SQLCODE = -206, ERROR:  D.DEPTNO IS NOT VALID IN THE CONTEXT
        WHERE IT IS USED                                             
DSNT418I SQLSTATE   = 42703 SQLSTATE RETURN CODE                     
DSNT415I SQLERRP    = DSNXORSO SQL PROCEDURE DETECTING ERROR         
DSNT416I SQLERRD    = -225  0  0  -1  0  0 SQL DIAGNOSTIC INFORMATION
DSNT416I SQLERRD    = X'FFFFFF1F'  X'00000000'  X'00000000'         
        X'FFFFFFFF'  X'00000000'  X'00000000' SQL DIAGNOSTIC       
        INFORMATION

The SQLCODE shows that you cannot refer to the department table in the nested table expression on the same level in the join. Cannot you? Obviously you can, but it requires a special syntax - just adding a TABLE keyword. It is that simple!

SELECT workdept, deptname, lastname, salary FROM dept d, TABLE 
(SELECT workdept, lastname, salary FROM emp e WHERE e.workdept = d.deptno
 ORDER BY salary DESC  
 FETCH FIRST 3 ROWS ONLY)
ORDER BY deptno, salary DESC, lastname;

Have you noticed the change? See the TABLE keyword before the nested table expression selecting from EMP.

The output is the following and matches our prior result:

WORKDEPT DEPTNAME                      LASTNAME   SALARY
A00 SPIFFY COMPUTER SERVICE DIV. HAAS 52750.00
A00 SPIFFY COMPUTER SERVICE DIV. HEMMINGER 46500.00
A00 SPIFFY COMPUTER SERVICE DIV. LUCCHESI 46500.00
B01 PLANNING THOMPSON 41250.00
C01 INFORMATION CENTER KWAN 38250.00
C01 INFORMATION CENTER NATZ 28420.00
C01 INFORMATION CENTER NICHOLLS 28420.00
D11 MANUFACTURING SYSTEMS STERN 32250.00
D11 MANUFACTURING SYSTEMS JOHN 29840.00
D11 MANUFACTURING SYSTEMS LUTZ 29840.00
D21 ADMINISTRATION SYSTEMS PULASKI 36170.00
D21 ADMINISTRATION SYSTEMS MARINO 28760.00
D21 ADMINISTRATION SYSTEMS MONTEVERDE 28760.00
E01 SUPPORT SERVICES GEYER 40175.00
E11 OPERATIONS HENDERSON 29750.00
E11 OPERATIONS SCHNEIDER 26250.00
E11 OPERATIONS SCHWARTZ 26250.00
E21 SOFTWARE SUPPORT SPENSER 26150.00
E21 SOFTWARE SUPPORT LEE 25370.00
E21 SOFTWARE SUPPORT WONG 25370.00

By adding the TABLE keyword (also known as the lateral correlation) Db2 allows us to have a join of a table and a nested table expression, which can have a correlated reference to the previous (left) table, not just a reference to a table from a higher level. This is what is called lateral join.

Enjoy! This feature is in Db2 for z/OS since version 6 (GA in 1998)!

Documentation

If you check the documentation, you will find the following information:

  • Db2 for z/OS syntax for the nested table expression speaks about the TABLE reference, check the nested-table-expression syntax. It says:

For table functions, an additional capability exists. A table function can contain one or more correlated references to other tables in the same FROM clause if the referenced tables precede the reference in the left-to-right order of the tables in the FROM clause. The same capability exists for nested table expressions if the optional keyword TABLE is specified; otherwise, only references to higher levels in the hierarchy of subqueries is allowed.

  • if you look into the Db2 for LUW documentation, it speaks about LATERAL keyword in the nested-table-expression clause also. The TABLE is just a synonym to LATERAL.
  • The LATERAL join/correlation is also being discussed in the SQL Reference for Cross Platform Development. The definition is as follows:

A reference to a column of a table identified at the same level in a nested table expression that is preceded by the TABLE keyword is called lateral correlation.

RANK() vs DENSE_RANK()

Before concluding the article, I will add one side note. You might have noticed that for the department A00, there are two employees with the same salary. We see three employees, but we don't see the third greatest salary for A00. The reason for that is that we used the RANK() function, which might leave some gaps in the ranking. Have we used the DENSE_RANK() function instead, there would be no gaps, and the query with the DENSE_RANK() might in fact return four or more rows. However, in such case, the solution would result in a different output than the solution with the lateral join. Keep this in mind as the results may vary depending on the method you use. Moreover, we might get different results even with the RANK() function if there are more than two employees with the same salary.

Conclusion

Lateral join gives you some new possibilities for nested table expressions and joins. The obvious question is whether to use lateral join or the window function, or something else in similar cases as discussed here. As usual it depends. I did not do any performance testing nor EXPLAINs for these two options and will let it to you. Please share your thoughts and findings in the comments below.

 

0 comments
10 views