2012-03-24Greater, Less and BETWEEN
The biggest performance risk of an INDEX RANGE SCAN is the leaf node traversal. Keeping the scanned index range small is the golden rule of indexing. to check that, you can ask yourself where the range scan starts and where it ends.
The question is easy to answer if the SQL statement shows the start and stop conditions explicitly:
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE date_of_birth >= TO_DATE(?, 'YYYY-MM-DD')
AND date_of_birth <= TO_DATE(?, 'YYYY-MM-DD')
An index on DATE_OF_BIRTH will be scanned for the specified date range only. The scan starts at the first date and ends at the second. The index range scanned for this query cannot be narrowed any further.
The answer is less obvious, if a second columns gets involved:
SELECT first_name, last_name, date_of_birth
FROM employees
WHERE date_of_birth >= TO_DATE(?, 'YYYY-MM-DD')
AND date_of_birth <= TO_DATE(?, 'YYYY-MM-DD')
AND subsidiary_id = ?
The optimal index has to cover both columns, of course. But the question is: in which order?
The following figures show the effect of the column order on the scanned index range when searching all employees of subsidiary 27, who were born between 1st and 9th January 1971.
Figure 2.2 visualizes a part of the index on DATE_OF_BIRTH and SUBSIDIARY_ID—in that order. Which range has to be scanned? Or, to put it another way: Where will the tree traversal end?
Figure 2.2. Range Scan in DATE_OF_BIRTH, SUBSIDIARY_ID index
The index is ordered by birth dates first. Only if two employees were born on the same day, the SUBSIDIARY_ID is used to sort these records. The query, however, filters for a date range. SUBSIDIARY_ID is not ordered in that range, hence, it is useless during tree traversal. The scanned index range is delimited by the conditions on DATE_OF_BIRTH only. It starts at the first entry matching the date range and ends at the last one. That are five leaf nodes in the figure.
The situation changes radically when reversing the column order so that the index starts with the SUBSIDIARY_ID. Figure 2.3 shows the search using this index.
Figure 2.3. Range Scan in SUBSIDIARY_ID, DATE_OF_BIRTH Index
The difference is that the equals operator limits the first index column to a single value (SUBSIDIARY_ID 27). The next index column is therefore sorted—at least within that SUBSIDIARY_ID value. So, there no need to visit the first leaf node because the branch node already indicates that there is no employee for subsidiary 27 born after 25th June 1969 stored in the first leaf node.
The three traversal ends at the second leaf node. Thereafter, the database follows the leaf node chain until finding a record that does not fulfill the search criterion anymore. That is, just the next leaf node entry. The range scan ends after reading a single leaf node.
Using this example, we can also falsify the myth that says the most selective column should be the first column in an index. The selectivity of each column alone is the same in both cases; there are 13 records for the date range and 13 records for the subsidiary. The selectivity does not give any advice in this case. Still, there is a definite answer how to order the columns.
The real performance difference depends on the data and the search terms. When searching for a tiny date range, the difference might be negligible. The wider the date range gets, the bigger the performance difference becomes.
To optimize performance, it is very important to know which index range is scanned. Most databases will even show it in the execution plan. You just have to know what to look for. The following execution plan, for example, indicates that the DATE_OF_BIRTH is the first in the index.
--------------------------------------------------------------
|Id | Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 4 |
|*1 | FILTER | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 4 |
|*3 | INDEX RANGE SCAN | EMP_TEST | 2 | 2 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:END_DT >= :START_DT)
3 - access(DATE_OF_BIRTH >= :START_DT
AND DATE_OF_BIRTH <= :END_DT)
filter(SUBSIDIARY_ID = :SUBS_ID)
The „Predicate Information“ for the INDEX RANGE SCAN gives the required hint. It describes the conditions of the where clause either as access or as filter predicate. This tells us how the database uses each condition.
Important
The access predicates express the start and stop conditions for an index search. They define the scanned index range.
Index filter predicates are applied during the leaf node traversal only. They do not narrow the scanned index range.
Note
The execution plan was simplified for clarity. The appendix explains the details about the „Predicate Information“ in full length.
The execution plan lists the conditions on the DATE_OF_BIRTH column as access predicates. That means, they limit the scanned index range. The DATE_OF_BIRTH is therefore the first column in the EMP_TEST index. The SUBSIDIARY_ID column is used as filter only.
The database can use all conditions from the where clause as access predicate if we turn the index definition around:
---------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 |
|* 1 | FILTER | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 3 |
|* 3 | INDEX RANGE SCAN | EMP_TEST2 | 1 | 2 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(:END_DT >= :START_DT)
3 - access(SUBSIDIARY_ID = :SUBS_ID
AND DATE_OF_BIRTH >= :START_DT
AND DATE_OF_BIRTH <= :END_T)
Finally, there is the between keyword. It allows you to specify the upper and lower bounds in one condition:
DATE_OF_BIRTH BETWEEN '01-JAN-71'
AND '10-JAN-71'
Note that between is always including the specified values. Just like using less-or-equal (<=) and greater-or-equal (>=) operators:
DATE_OF_BIRTH >= '01-JAN-71' AND DATE_OF_BIRTH <= '10-JAN-71'

share and subscribe
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook