Index-Only Scan: Avoiding Table Access


The index-only scan belongs to the most powerful tuning methods at all. It utilizes not only the clustering of rows with similar values, but also exploits the fact that the index has a copy of the indexed data. That can avoid the table access that is usually following an index scan.

To cover an entire query, an index must contain all columns from the SQL statement. That means, the index must not only cover the where clause, but also the columns from other parts of the SQL statement—that means, also the select clause:

CREATE INDEX sales_sub_eur
    ON sales
     ( subsidiary_id, eur_value );

SELECT SUM(eur_value)
  FROM sales
 WHERE subsidiary_id = ?;

Indexing the where clause takes precedence over the other clauses, of course. Thus, SUBSIDIARY_ID at first position to qualify as an access predicate.

Coaching by the Author
Faster, easier and more memorable than reading.

The execution plan shows the index scan but no table access (TABLE ACCESS BY INDEX ROWID).

----------------------------------------------------------
| Id  | Operation         | Name          |  Rows | Cost |
----------------------------------------------------------
|   0 | SELECT STATEMENT  |               |     1 |  104 |
|   1 |  SORT AGGREGATE   |               |     1 |      |
|*  2 |   INDEX RANGE SCAN| SALES_SUB_EUR | 40388 |  104 |
----------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SUBSIDIARY_ID"=TO_NUMBER(:A))

The index covers the entire query. In this case, the index is also called a “Covering Index”.

Note

An index that prevents a table access is also called “Covering Index”.

The term is misleading, however, because it sounds like an index property—although it describes the way the index is used.

If the index would not cover the EUR_VALUE column, the database would need to fetch all selected rows from the table to calculate the total. But the table access is not necessary because the index has a copy of the required column.

An index-only scan can improve performance dramatically. Just look at the Rows estimate in the execution plan: the optimizer expects to sum about forty thousand rows. That means that the index-only scan prevents forty thousand table fetches—if each row is in a different table block. If the index has a good clustering factor—that is, if the respective rows are well clustered to a few table blocks—the advantage may be significantly lower.

The number of accessed rows is another factor that limits the performance gain of an index-only scan. Selecting a single row, for example, saves one table access only. Considering that the index tree traversal needs to fetch a few blocks as well, the saved table access might become negligible.

Important

The performance advantage of an index-only scans depends on the number of accessed rows and the index clustering factor.

The index-only scan is an aggressive indexing strategy. Do not design an index for an index-only scan on suspicion only. It needs more space and increases the index maintenance effort. See Chapter 8. In practice, you should first index without considering the select clause and extend the index later, if really needed.

Index-only scans can also cause unpleasant performance behaviour. If you extend the query, for example:

SELECT SUM(eur_value)
  FROM sales
 WHERE subsidiary_id = ?
   AND sale_date > ?;

The where clause refers to a column that is not in the index. The database can therefore not use the index with an index-only scan anymore:

--------------------------------------------------------------
|Id | Operation                    | Name      | Rows  |Cost |
--------------------------------------------------------------
| 0 | SELECT STATEMENT             |           |     1 | 371 |
| 1 |  SORT AGGREGATE              |           |     1 |     |
|*2 |   TABLE ACCESS BY INDEX ROWID| SALES     |  2019 | 371 |
|*3 |    INDEX RANGE SCAN          | SALES_DATE| 10541 |  30 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter("SUBSIDIARY_ID"=TO_NUMBER(:A))
   3 - access("SALE_DATE">:B)

Adding columns to an SQL statement can disable an index-only scan, so that the execution becomes much slower. That might seem illogical, if the user expects improved performance when fetching fewer rows.

Warning

Accessing more columns may lead to unexpected performance behaviour. Check the execution plan before extending queries.

Anything can happen if new columns prevent an index-only scan. The index previously used for an index-only scan becomes less attractive, so that the optimizer might take a different index for the new query. It actually happened above. The extended query uses an index on SALE_DATE, which is a leftover from the join chapter.

This index has two advantages for the extended query: First, it matches less rows. The optimizer estimates that about ten thousand rows match the SALE_DATE filter. The filter on SUBSIDIARY_ID matches forty thousand rows, on the other hand. The estimates are visible in the Rows column of the execution plan, but they are purely arbitrary because of the bind parameters. The SALE_DATE condition could, for example, select the entire table, when filtering for a very early date.

The second advantage of the SALES_DATE index is its clustering factor. This is a valid reason because the SALES table grows chronologically. New rows are appended to the end of the table, if there are no delete statements. The same is true for the index, if the SALE_DATE column has the current date. In other words, the table and index grow at the end. The table order will therefore correspond to the index order. The index has a good clustering factor.

A better index clustering factor means that the subsequent table access is less expensive. The new filter on SALE_DATE reduces the number of accessed rows additionally. That means that the query extension affects both factors that limit the performance gain of an index-only. It reduces the number of rows and opens a new access path that benefits from a better clustering factor. The extended query might therefore execute fast enough, even if it needs to perform the table access. Consider removing unnecessary columns or dropping the entire index in that case.

Note

Some indexes have a good clustering factor automatically. An index-only scan gains little for them.

This particular example is a lucky coincidence. The new filter does not only prevent an index-only scan, but opens a new access path at the same time. The optimizer takes the new opportunity to keep the performance impact low. The impact would be huge, on the other hand, if there wouldn’t be an index on SALE_DATE.

The same is true, if new select clause columns prevent an index-only scan. The optimizer will fall back to the next best execution plan. But there is a catch: Adding columns to the select clause cannot open a new access path.

Consider the following example. You have a query that is well indexed but still too slow. There are only few columns in the select clause, so you go for an index-only scan and add the columns form the select clause to the index. Problem solved. A year later, somebody adds another column to the select clause but doesn’t update the index. Guess what is the next best execution plan? It is the plan that was too slow before introducing the index-only scan.

Tweet this tip

Tip

Maintain your index-only scans.

Add comments that remind you about the index-only scan. Refer to this page, so everybody can read about it.

Function based indexes are another trap you might fall into when aiming for an index-only scan. An index-only scan will not work if you select the LAST_NAME column, but index the UPPER(last_name) expression. That is why the index in the previous section should have used the LAST_NAME column itself, so that it can be used to cover the select clause with an index-only scan.

Tip

Always aim to index the original data. That is often the most useful information you can put into an index.

Avoid function-based indexing if queries cannot use the expression as access predicate anyway.

Aggregate queries, like above, make good candidates for index-only scans. They query many rows, but only a few columns. That means that even a small index can be used for an index-only scan. The more columns you query, the more columns must be indexed for an index-only scan. As developer, you should therefore query only the columns you need.

Tweet this tip

Tip

Avoid select * and fetch only the columns you need.

Indexing many columns needs more space, but pushes you also towards database limits. Most databases impose rather rigid limits on the key length and the number of columns per index. That means that you cannot index an arbitrary number of columns, nor arbitrary long columns. The following overview lists the most important limits. Regardless of those limits, there are indexes that cover an entire table, as we see in the next section.

DB2

DB2 V8 limits an index to 64 column with a maximum key length of 2000 byte, reduced by an overhead that depends on the number and type of the columns.

MySQL

MySQL 5.6 with InnoDB limits each column to 767 byte, and all columns together to 3072 byte. MyISAM indexes are limited to 16 columns and a maximum key length of 1000 byte.

The warningSpecified key was too long; max key length is 767 bytes appears” during create index if an indexed column exceeds the column length limit. If the total key length is reached, the create index fails with the error message “Specified key was too long; max key length is [1000/3072] bytes.

MySQL has a special feature called “prefix indexing” (sometimes also “partial indexing”). It means to index only the beginning of a column. It is automatically applied if an index column exceeds the length limit (767 Bytes for InnoDB). That is the reason the create index statement does not fail in that case: the indexed column is automatically truncated to the maximum size. That means that the index doesn’t contain a full copy of the table data, so it might prevent index-only scans like function-based indexing.

You can also use MySQL’s prefix indexing manually to prevent exceeding the total key length limit:

CREATE INDEX .. ON employees (last_name(10));

The number in brackets after the column name is the prefix length that is indexed.

Oracle

The maximum index key length depends on the block size and the index storage parameters (75% of the database block size minus some overhead). A B-Tree index is limited to 32 columns.

The limit is at 6398 byte for Oracle 11g with all defaults (8k blocks). The respective error is “ORA-01450: maximum key length (6398) exceeded”.

PostgreSQL

PostgreSQL does not yet support index-only scans, but they will probably come with PostgreSQL 9.2.

B-Tree indexes are limited to 2713 byte (hardcoded, approx. BLCKSZ/3). The respective error message “index row size ... exceeds btree maximum, 2713” appears not until executing an insert or update that exceeds the limit. B-Tree indexes can contain up to 32 columns.

SQL Server

SQL Server limits the key length to 900 Byte and 16 key columns.

SQL Server has an extension that allows to add arbitrary long columns to an index for the sole purpose to support an index only scan. It requires a distinction between key columns and nonkey columns. Key columns are index columns as they were discussed so far. Nonkey columns are additional columns that appear the index leaf nodes only. An index may have up to 1023 nonkey columns, and there is no particular size limit. SQL Server cannot use nonkey columns for access predicates (seek predicates).

SQL Server uses the keyword include to add nonkey columns when creating an index:

 CREATE INDEX empsubupnam
     ON employees
       (subsidiary_id, last_name)
INCLUDE(phone_number, first_name);

Think

Queries that do not select any table columns are often executed as index-only scan.

Can you think of a meaningful example?

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql