Access Paths Part 1: The introduction of Db2 optimizer and cost model:
Posted By: Tony Andrews
Written By: IBM Data Management Console Development Group
Jun Liu, Yan Zeng, Jun Qing Huang, Bian Li
Whether they are beginners or experienced DBA, they have more or less doubts about the behavior of Db2, especially in terms of the execution performance of SQL statements. The questions they often ask are: Obviously this SQL is very simple and only needs to return very few records, why did Db2 take so long? Why is there a primary key index on this table, but Db2 doesn't seem to use it at all? Why is this SQL sometimes fast and sometimes unbearably slow, and the difference is only the input parameters on the predicate? …To answer these questions, firstly they should understand how SQL statements are processed internally by Db2 optimizer.
In a sense, SQL is a declarative language that only describes the data of interest in the program, not an algorithm to obtain the data. Therefore, there are often many ways to implement a given SQL. These different methods are known as execution plans or access paths. Although different access paths of SQL statements will produce the same result set, they are likely not to perform tasks at the same performance level. Db2 needs to consider various factors comprehensively and try to obtain the results in the best way (execution plan). At this time, Db2 needs to create an access path. The access path defines how to access the table, which indexes to use and what join method to associate the table or intermediate results, so as to finally obtain the expected results. A good or optimal access path is very important for the efficient execution of SQL statements. If one day, it is found that the execution performance of SQL is not as expected. When optimizing from the SQL level, how to understand the execution plan currently selected by Db2 has become an indispensable part such as whether the access of tables is efficient, whether appropriate indexes are selected, whether the join order between tables and the join method between tables are appropriate and so on.
- Db2 optimizer and cost model
The optimizer is the heart and soul of Db2. Functionally, it is equivalent to an expert system, a set of standard rules. Regardless of how the data is stored and manipulated, Db2 needs to access the data based on the SQL definition. Separating the data access criteria from the physical storage characteristics is called "physical data independence". The Db2 optimizer is the component that realizes this physical data independence. For example, in the case of deleting some indexes, Db2 can still access the data through a table scan, although it may not be so efficient. Another example is that you can add a new column to the table, and Db2 can still manipulate the data without changing the code of the program. All of this is possible because the physical access path to the data is not hard-coded by the programmer in the program, but automatically generated by Db2. It calls the data access path the Access path, which defines how to access the table, which indexes to use, and which join method to associate tables or intermediate results, and finally get the expected results.