by Markus Winand.

Index Filter Predicates Used Intentionally


Very often index filter predicates indicate improper index usage caused by an incorrect column order in a concatenated index. Nevertheless index filter predicates can be used for a good reason as well—not to improve range scan performance but to group consecutively accessed data together.

Where clause predicates that cannot serve as access predicate are good candidates for this technique:

SELECT first_name, last_name, subsidiary_id, phone_number
  FROM employees
 WHERE subsidiary_id = ?
   AND UPPER(last_name) LIKE '%INA%'

Remember that LIKE expressions with leading wildcards cannot use the index tree. That means that indexing LAST_NAME doesn’t narrow the scanned index range—no matter if you index LAST_NAME or UPPER(last_name). This condition is therefore no good candidate for indexing.

On my Own Behalf

I offer training, tuning and consulting. Buying my book “SQL Performance Explained” (from €9.95) also supports my work on this website.

However the condition on SUBSIDIARY_ID is well suited for indexing. We don’t even need to add a new index because the SUBSIDIARY_ID is already the leading column in the index for the primary key.

In the above execution plan, the cost value raises a hundred times from the INDEX RANGE SCAN to the subsequent TABLE ACCESS BY INDEX ROWID operation. In other words: the table access causes the most work. It is actually a common pattern and is not a problem by itself. Nevertheless, it is the most significant contributor to the overall execution time of this query.

The table access is not necessarily a bottleneck if the accessed rows are stored in a single table block because the database can fetch all rows with a single read operation. If the same rows are spread across many different blocks, in contrast, the table access can become a serious performance problem because the database has to fetch many blocks in order to retrieve all the rows. That means the performance depends on the physical distribution of the accessed rows—in other words: it depends on the clustering of rows.

Note

The correlation between index order and table order is a performance benchmark—the so-called index clustering factor.

It is in fact possible to improve query performance by re-ordering the rows in the table so they correspond to the index order. This method is, however, rarely applicable because you can only store the table rows in one sequence. That means you can optimize the table for one index only. Even if you can choose a single index for which you would like to optimize the table, it is still a difficult task because most databases only offer rudimentary tools for this task. So-called row sequencing is, after all, a rather impractical approach.

The Index Clustering Factor

The index clustering factor is an indirect measure of the probability that two succeeding index entries refer to the same table block. The optimizer takes this probability into account when calculating the cost value of the TABLE ACCESS BY INDEX ROWID operation.

This is exactly where the second power of indexing—clustering data—comes in. You can add many columns to an index so that they are automatically stored in a well defined order. That makes an index a powerful yet simple tool for clustering data.

To apply this concept to the above query, we must extend the index to cover all columns from the where clause—even if they do not narrow the scanned index range:

CREATE INDEX empsubupnam ON employees
       (subsidiary_id, UPPER(last_name))

The column SUBSIDIARY_ID is the first index column so it can be used as an access predicate. The expression UPPER(last_name) covers the LIKE filter as index filter predicate. Indexing the uppercase representation saves a few CPU cycles during execution, but a straight index on LAST_NAME would work as well. You’ll find more about this in the next section.

The new execution plan shows the very same operations as before. The cost value dropped considerably nonetheless. In the predicate information we can see that the LIKE filter is already applied during the INDEX RANGE SCAN. Rows that do not fulfill the LIKE filter are immediately discarded. The table access does not have any filter predicates anymore. That means it does not load rows that do not fulfill the where clause.

The difference between the two execution plans is clearly visible in the “Rows” column. According to the optimizer’s estimate, the query ultimately matches 17 records. The index scan in the first execution plan delivers 333 rows nevertheless. The database must then load these 333 rows from the table to apply the LIKE filter which reduces the result to 17 rows. In the second execution plan, the index access does not deliver those rows in the first place so the database needs to execute the TABLE ACCESS BY INDEX ROWID operation only 17 times.

You should also note that the cost value of the INDEX RANGE SCAN operation grew from two to three because the additional column makes the index bigger. In view of the performance gain, it is an acceptable compromise.

Warning

Don’t introduce a new index for the sole purpose of filter predicates. Extend an existing index instead and keep the maintenance effort low. With some databases you can even add columns to the index for the primary key that are not part of the primary key.

The following animation demonstrates the difference between the two execution plans:

Figure 5.1 Intentional Index Filter-Predicates

This trivial example seems to confirm the common wisdom to index every column from the where clause. This “wisdom”, however, ignores the relevance of the column order which determines what conditions can be used as access predicates and thus has a huge impact on performance. The decision about column order should therefore never be left to chance.

The index size grows with the number of columns as well—especially when adding text columns. Of course the performance does not get better for a bigger index even though the logarithmic scalability limits the impact considerably. You should by no means add all columns that are mentioned in the where clause to an index but instead only use index filter predicates intentionally to reduce the data volume during an earlier execution step.

About the Author

Photo of Markus Winand

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Buy his Book

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy now!
(paperback and/or PDF)

Paperback also available at Amazon.com.

Hire Markus

Markus offers SQL training and consulting for developers working at companies of any size.
Learn more »

Do not use offset for pagination

Learn why

Visit my sibling!A lot changed since SQL-92!

The Use The Index, Luke! mug

Stickers, coasters, books and coffee mugs. All you need for learning.

Shop now

Connect with Markus Winand

Markus Winand on LinkedInMarkus Winand on XINGMarkus Winand on Twitter
“Use The Index, Luke!” by Markus Winand is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.
Legal | Contact | NO WARRANTY | Trademarks | Privacy and GDPR | CC-BY-NC-ND 3.0 license