The previous section explained how to gain additional benefits from an existing index by changing its column order, but the example considered only two SQL statements. Changing an index, however, may affect all queries on the indexed table. This section explains the way databases pick an index and demonstrates the possible side effects when changing existing indexes.
The adopted EMPLOYEES_PK
index improves the performance of all queries that search by subsidiary only. It is however usable for all queries that search by SUBSIDIARY_ID
—regardless of whether there are any additional search criteria. That means the index becomes usable for queries that used to use another index with another part of the where
clause. In that case, if there are multiple access paths available it is the optimizer’s job to choose the best one.
Changing an index might have unpleasant side effects as well. In our example, it is the internal telephone directory application that has become very slow since the merger. The first analysis identified the following query as the cause for the slowdown:
SELECT first_name, last_name, subsidiary_id, phone_number
FROM employees
WHERE last_name = 'WINAND'
AND subsidiary_id = 30
The execution plan is:
Example 2.1 Execution Plan with Revised Primary Key Index
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 30 |
|*1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 30 |
|*2 | INDEX RANGE SCAN | EMPLOYEES_PK | 40 | 2 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LAST_NAME"='WINAND')
2 - access("SUBSIDIARY_ID"=30)
The execution plan uses an index and has an overall cost value of 30. So far, so good. It is however suspicious that it uses the index we just changed—that is enough reason to suspect that our index change caused the performance problem, especially when bearing the old index definition in mind—it started with the EMPLOYEE_ID
column which is not part of the where
clause at all. The query could not use that index before.
For further analysis, it would be nice to compare the execution plan before and after the change. To get the original execution plan, we could just deploy the old index definition again, however most databases offer a simpler method to prevent using an index for a specific query. The following example uses an Oracle optimizer hint for that purpose.
SELECT /*+ NO_INDEX(EMPLOYEES EMPLOYEES_PK) */
first_name, last_name, subsidiary_id, phone_number
FROM employees
WHERE last_name = 'WINAND'
AND subsidiary_id = 30
The execution plan that was presumably used before the index change did not use an index at all:
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 477 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 1 | 477 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LAST_NAME"='WINAND' AND "SUBSIDIARY_ID"=30)
Even though the TABLE ACCESS FULL
must read and process the entire table, it seems to be faster than using the index in this case. That is particularly unusual because the query matches one row only. Using an index to find a single row should be much faster than a full table scan, but in this case it is not. The index seems to be slow.
In such cases it is best to go through each step of the troublesome execution plan. The first step is the INDEX RANGE SCAN
on the EMPLOYEES_PK
index. That index does not cover the LAST_NAME
column—the INDEX RANGE SCAN
can consider the SUBSIDIARY_ID
filter only; the Oracle database shows this in the “Predicate Information” area—entry “2” of the execution plan. There you can see the conditions that are applied for each operation.
Tip
Appendix A, “Execution Plans”, explains how to find the “Predicate Information” for other databases.
The INDEX RANGE SCAN
with operation ID 2 (Example 2.1) applies only the SUBSIDIARY_ID=30
filter. That means that it traverses the index tree to find the first entry for SUBSIDIARY_ID
30. Next it follows the leaf node chain to find all other entries for that subsidiary. The result of the INDEX RANGE SCAN
is a list of ROWIDs
that fulfill the SUBSIDIARY_ID
condition: depending on the subsidiary size, there might be just a few ones or there could be many hundreds.
The next step is the TABLE ACCESS BY INDEX ROWID
operation. It uses the ROWIDs
from the previous step to fetch the rows—all columns—from the table. Once the LAST_NAME
column is available, the database can evaluate the remaining part of the where
clause. That means the database has to fetch all rows for SUBSIDIARY_ID=30
before it can apply the LAST_NAME
filter.
The statement’s response time does not depend on the result set size but on the number of employees in the particular subsidiary. If the subsidiary has just a few members, the INDEX RANGE SCAN
provides better performance. Nonetheless a TABLE ACCESS FULL
can be faster for a huge subsidiary because it can read large parts from the table in one shot (see “Full Table Scan”).
On my Own Behalf
I make my living from SQL training, SQL tuning and consulting and my book “SQL Performance Explained”. Learn more at https://winand.at/.
The query is slow because the index lookup returns many ROWIDs
—one for each employee of the original company—and the database must fetch them individually. It is the perfect combination of the two ingredients that make an index slow: the database reads a wide index range and has to fetch many rows individually.
Choosing the best execution plan depends on the table’s data distribution as well so the optimizer uses statistics about the contents of the database. In our example, a histogram containing the distribution of employees over subsidiaries is used. This allows the optimizer to estimate the number of rows returned from the index lookup—the result is used for the cost calculation.
If there are no statistics available—for example because they were deleted—the optimizer uses default values. The default statistics of the Oracle database suggest a small index with medium selectivity. They lead to the estimate that the INDEX RANGE SCAN
will return 40 rows. The execution plan shows this estimation in the Rows column (again, see Example 2.1). Obviously this is a gross underestimate, as there are 1000 employees working for this subsidiary.
If we provide correct statistics, the optimizer does a better job. The following execution plan shows the new estimation: 1000 rows for the INDEX RANGE SCAN
. Consequently it calculated a higher cost value for the subsequent table access.
---------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 680 |
|*1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 680 |
|*2 | INDEX RANGE SCAN | EMPLOYEES_PK | 1000 | 4 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("LAST_NAME"='WINAND')
2 - access("SUBSIDIARY_ID"=30)
The cost value of 680 is even higher than the cost value for the execution plan using the FULL TABLE SCAN
(477). The optimizer will therefore automatically prefer the FULL TABLE SCAN
.
On my Own Behalf
I make my living from SQL training, SQL tuning and consulting and my book “SQL Performance Explained”. Learn more at https://winand.at/.
This example of a slow index should not hide the fact that proper indexing is the best solution. Of course searching on last name is best supported by an index on LAST_NAME
:
CREATE INDEX emp_name ON employees (last_name)
Using the new index, the optimizer calculates a cost value of 3:
Example 2.2 Execution Plan with Dedicated Index
--------------------------------------------------------------
| Id | Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 3 |
|* 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 3 |
|* 2 | INDEX RANGE SCAN | EMP_NAME | 1 | 1 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("SUBSIDIARY_ID"=30)
2 - access("LAST_NAME"='WINAND')
The index access delivers—according to the optimizer’s estimation—one row only. The database thus has to fetch only that row from the table: this is definitely faster than a FULL TABLE SCAN
. A properly defined index is still better than the original full table scan.
The two execution plans from Example 2.1 and Example 2.2 are almost identical. The database performs the same operations and the optimizer calculated similar cost values, nevertheless the second plan performs much better. The efficiency of an INDEX RANGE SCAN
may vary over a wide range—especially when followed by a table access. Using an index does not automatically mean a statement is executed in the best way possible.