On Clause vs Where Clause Filtering Predicates in Joins

Posted By: Tony Andrews Technical Content,

When coding SQL joins, is there any difference in where you code your filtering predicates?  We know that when you code SQL using the ANSI-92 syntax, the join predicate must be coded as part of the ON clause. But the filtering predicates can be coded as part of the ON or the WHERE clauses.

This ANSI-92 syntax is best for a couple of reasons:

  • You can never have an orphan table that is not being joined to, thus avoiding cartesian joining.  Also called cross joins.
  • Separates the join relationship logic in the ON clause from the filtering logic in the WHERE clause.

Many say this is cleaner and easier to read and understand.

But what about seeing a filtering predicate as part of the ‘ON’ clause?  When it comes to inner joins, I have not seen a difference in logic or performance depending on where the filtering predicates are placed.  But I do not like to say ‘always’ or ‘never’ as there may be an example somewhere.  In IBM documentation it states: ‘ For inner joins, ON clause predicates can supply the join condition and local filtering, and they are semantically equivalent to WHERE clause predicates’.

For example, the following 2 queries are identical. 

Provide a list of the department female managers:  If you run explains on the two queries, you will identical execution plans.

SELECT D.DEPTNO, D.DEPTNAME, 
             D.MGRNO, E.LASTNAME
FROM DEPT D INNER JOIN  EMP E
     ON D.MGRNO = E.EMPNO
  AND E.SEX = ‘F’ 
;
SELECT D.DEPTNO, D.DEPTNAME, 
             D.MGRNO, E.LASTNAME
FROM DEPT D INNER JOIN  EMP E
     ON D.MGRNO = E.EMPNO
 WHERE E.SEX = ‘F’ 
;

But when it comes to outer joins, there is a difference.  For example, look at the output difference between the same two queries, but changed to a LEFT JOIN.

SELECT D.DEPTNO, D.DEPTNAME, 
             D.MGRNO, E.LASTNAME
FROM DEPT D LEFT   EMP E
     ON D.MGRNO = E.EMPNO
  AND E.SEX = ‘F’ 
;

Example 1:

SELECT D.DEPTNO, D.DEPTNAME, 
             D.MGRNO, E.LASTNAME
FROM DEPT D LEFT JOIN  EMP E
     ON D.MGRNO = E.EMPNO
 WHERE E.SEX = ‘F’ 
ORDER BY D.DEPTNO
;

Example 2:

In the first left join example, having the E.SEX = ‘F’ as part of the ON clause is applying logic specific to the

Join (and not actually filtering rows).  Left join logic states to keep the rows from the left table even when no match (based on the join predicate) is found in the other table. By having the additional predicate as part of the ON clause dictates when to bring back data from the other table. For this example, it states that even if a join match (on the join column) is found, the EMP E data then only matches when the SEX='F'.  If it doesn't have SEX='F', the left-hand table data is still returned with nulls from the EMP table.

In the second left join example, having the E.SEX=’F’ as part of the WHERE clause states for the result set to only contain those rows.  In left join logic, the second table is called the null supplying table because if a match is not found, nulls are returned for any columns specified in the SELECT for that table (in this example that would be the E.LASTNAME).  When you code a WHERE predicate on the null supplying table, you are stating that you do not want to see anything (including nulls) returned from the null supplying table other than when a match is found and the SEX=’F’.  So, it really becomes an inner join.  If you look at an explain for left join queries that contain WHERE logic against the null supplying table, you will see that the optimizer rewrites it as an inner join.  The predicate as part of the WHERE clause then actually filters data.

Another example with left join:

SELECT D.DEPTNO, D.DEPTNAME, 
             D.MGRNO, E.LASTNAME
FROM DEPT D LEFT   EMP E
     ON D.MGRNO = E.EMPNO
  AND D.DEPTNO LIKE ‘D%’  
;

Example 3:

SELECT D.DEPTNO, D.DEPTNAME, 
             D.MGRNO, E.LASTNAME
FROM DEPT D LEFT   EMP E
     ON D.MGRNO = E.EMPNO
WHERE D.DEPTNO LIKE ‘D%’  
ORDER BY D.DEPTNO
;

Example 4:

In example 3, only the departments that begin with ‘D’ are to be joined for a possible match on the EMP table and bring back any EMP E data. Sometimes, logic is to bring back data from the other table based on specific conditions. Left join logic join states to produce all the rows from the DEPT table regardless of a match or not.  But now with the additional ON clause predicate, it states to bring back EMP E data only for those departments that begin with a ‘D’. When the ON filtering predicate is on the left join table, it will only go try to find a match based on that condition. In this example only 2 times (‘D11’, ‘D21’) because the third ‘D’ department has a MGRNO of null. This same logic can be handled by CASE logic in the SELECT clause that dictates what to show on each returned column as part of a row’s output.

For example:

SELECT D.DEPTNO, D.DEPTNAME, D.MGRNO,
       CASE 
         WHEN D.DEPTNO LIKE 'D%' THEN E.LASTNAME
         ELSE                        NULL
       END  AS LASTNAME    
FROM DEPT D LEFT JOIN EMP E
   ON D.MGRNO = E.EMPNO
 ORDER BY D.DEPTNO

With the CASE logic, joins take place and bring back LASTNAME matches, and then the CASE logic decides whether to show the name or not. The D.DEPTNO LIKE ‘D%’ as part of the ON clause will only try to find a match on that condition.

In example 4 the predicate moved to the WHERE clause now becomes a filtering predicate, stating to only bring back the rows where the department begins with a ‘D’. Because the filtering predicate is on the left table, the filtering will take place and then the left join processing takes place.

Summary:

Predicates can be placed in the ON or WHERE parts of an INNER JOIN and have no effect on either the logic, or performance. Bur best practice states to have only the join predicates as part of the ON clause.

For outer joins, placement of predicates affects logic. If placed as part of the ON join logic, this states to only join and bring back data from the null supplying table based on the predicate(s) condition while still maintaining left join logic.

For outer joins, these predicates in the WHERE become filtering and affect specifically the rows being returned in the result set.

A best practice is to put filtering predicates where they belong logically. If it specific to the join operation, then it becomes part of the ON clause, else keep it in the WHERE clause. But keep in mind that predicates in the ON clause only affect the joining and not the filtering out of rows.

As we look at these examples, there may be other ways to get the same results. The purpose of this article is to show the difference between having what looks like filtering predicates as part of the ON versus WHERE. Only testing will tell!