SQL Tuning for Developers – Part 1
I find when it comes to tuning a query, program, or even an application that most developers are not sure where to begin or what may be causing response time issue(s). Many relational tuning experts agree that the majority of performance problems with applications that access a relational database are caused by poorly coded programs or improperly coded SQL. Poor performing SQL is responsible for much of the response-time issues. SQL developers should be informed of the many performance issues associated with the SQL language and also the way they design their programs. It would be especially helpful if more developers were familiar with reading and analyzing DB2 explain output. If developers can look for just a few of the basic/obvious areas of possible performance issues from an explain output, that could help minimize many of the performance issues that arise. This article is a great start for developers, analyst, testers, or anyone that writes SQL. Everyone can empower themselves when it comes to finding performance issues in areas that they can control, and for this article that starts with the SQL statement itself.
Not all performance issues come from the application side, or even the SQL statement itself, but many of them do and the more people we can educate in this area, the fewer performance problems and incident reports come out of production environments. So when it comes to tuning, the SQL statements should be the first place to start. Following are the steps to take when analyzing a SQL statement that may not be performing well.
Step 1: Check every predicate for possible rewrites
- Can any of them be rewritten differently? Rewriting a predicate a different way while still maintaining its logic can sometimes cause the optimizer to take a different access path. For example if the data you are querying only contains department values ‘D01’, ‘D11’ and ‘D21’ and you want to get all of them, you could code any of the following. Rewriting predicates differently is not a guarantee of an optimization access path change, but at times it does.
For example: WHERE DEPTNO LIKE ‘D%’
WHERE DEPTNO BETWEEN ‘D01’ and ‘D21’
WHERE DEPTNO IN (‘D01’, ‘D11’, ‘D21’)
WHERE DEPTNO >= ‘D01’ and DEPTNO <= ‘D21’
- Some predicate logic can just be rewritten different. For example it is typically more efficient to have And’ed predicates over ‘Or’d predicates whenever possible.
For example: WHERE (DEPTNO = 'A00' AND GENDER = 'F')
OR (DEPTNO = 'A00' AND EDLEVEL > 16)
can rewritten as
WHERE (DEPTNO = 'A00')
AND (GENDER = 'F' OR EDLEVEL > 16)
- Subquery predicates can always be rewritten a different way. Non-correlated subqueries can be rewritten as correlated subqueries and vice versa. The important point here is subquery predicate logic written differently (yet maintaining the same logic) executes very differently.
For example: WHERE D.DEPTNO IN
(SELECT DEPTNO FROM PROJ)
can be rewritten as
(SELECT 1 FROM PROJ P
WHERE P.DEPTNO = D.DEPTNO)
Note #1: When a query contains multiple subqueries, keep in mind that DB2 will execute a non-correlated subquery before any correlated subquery no matter the physical order of the predicates. So it’s best to code all subqueries in a query the same type and code them in most restrictive order to least restrictive order.
Note #2: The optimizer will at times rewrite a non-correlated to a correlated (or vice versa) or even rewrite the subquery to a join. This can undo Note #1 logic. But knowing this should not stop us from the rewriting and reordering.
The idea here is that when working on a query to get it perform faster, rewriting predicates will sometimes change the optimization access path, which will then change the runtimes. This is a great first step.
Step 2: Keep predicate logic simple.
Do any of the predicates contain functions, mathematics, or not logic on the columns? Can they be rewritten without? For most predicates that contain functions or mathematics on a column, Db2 puts that predicate into a stage 2 non-indexable category. This is the most expensive predicate in SQL processing, and will not take into account any associated index for the column involved.
For example the following are not efficient predicates and should be rewritten:
WHERE RTRIM(LASTNAME) = ‘ANDREWS’
WHERE HIREDATE – 7 DAYS > ?
WHERE SALARY * 1.1 BETWEEN ? and ?
WHERE SALARY NOT BETWEEN 25000 and 40000
WHERE UPPER(LASTNAME) = ‘ANDREWS’
Note: Db2 V11 now handles some of the more common functions SQL developers used in WHERE logic as stage 1 indexable due to the optimizer doing the rewrite for us. See this article from last year for details.
Step 3: Check for any stage 2 predicates
- Are there any stage 2 predicates? You would need to execute a Db2 explain that will highlight any of these predicates or go to the SQL manual for your specific Db2 version. Rows retrieved in processing can have certain predicates applied during stage 1, and certain predicates applied during stage 2. Stage 2 predicates cost more (about 10% more expensive to evaluate) that stage 1. While the optimizer over the last few versions of Db2 evaluates and processes more predicates during stage 1, Stage 2 predicates still exists. Some of these can be rewritten to stage 1, and some cannot. But we need to look for these and possible rewrite them. Not all stage 2 predicates can be rewritten as stage 1. If a query has a stage 2 predicate that it needs for logic that cannot be rewritten, that’s OK. Do not take them out of the query and handle the logic in application code.
For Example: WHERE ‘2017-01-01’ NOT BETWEEN START_DT AND END_DT can be rewritten
WHERE START_DT > ‘2017-01-01’
OR END_DT < ‘2017-01-01’
Step 4: Can the query be rewritten differently
I wrote an article and did a presentation at IDUG on this topic. The point here is that sometimes there may be 2,3,4,5,6 different ways to write a SQL statement and get back the same result set. Is 1 way always better than any of the other ways? No because it depends on a number of variables. For this article the important point is that by rewriting a query different ways it may trigger a different access path from the optimizer. With different access paths you get different runtimes. The article can be found at Advanced SQL and the Power of Rewriting Queries and the presentation can be found on ON24 or YouTube! You will see many examples of SQL rewrites.
Step 5: Does the query have a Distinct, Group By, or Order By?
These parts of the SQL statement often requires a sort from Db2. I often find many queries that do not need these statements in the query, but are left there from a copied queried. The old saying is the most efficient sort is the one that never executes. So for this step, the query needs analyzed to make sure if it contains any of these, are they needed?
Program Design: If the SQL statements I am analyzing are embedded in a program, then the next thing is to look for the number of executions on each statement during a program execution. My standards say no program should be running in a production environment without some audit detail on the number of times each query gets executed during a runtime. Why a standard? Because the more times a SQL statement gets executed from a program the longer the program runs. There is overhead in sending a SQL query to DB2 for execution and the returning of information on each one. If this can be cut down from the program flow, the faster the program will run. Besides poorly written SQL statements, this is the biggest reason programs may run too long. Program application design is very important, and the less we go to DB2 the faster our runtimes. For example:
- Take advantage of multi row fetch and multi row inserting
- Code multi table joins. Do not break up the joins into individual queries
- Sort input data so a program does not do many Select statements for the same data
- Take advantage of stored procedures, user defined functions, and SQL triggers to help put multiple SQL executions into 1 call to DB2
I find that the above steps can alleviate many performance problems in queries or programs and can all be done directly by anyone writing SQL queries. ‘Getting Empowered’ is what we want developers to become. The more they know, the more they become empowered, and the fewer problems we have in production environments. And as always make sure runtime test are conducted to make sure your changes actually improved what is currently running!
I will follow this article up with Part 2 to discuss on the topic of SQL performance.