Index Filter Predicates Used Intentionally

Very often index filter predicates indicate improper index usage caused by an incorrect column order in a concatenated index. Nevertheless index filter predicates can be used for a good reason as well—not to improve range scan performance but to group consecutively accessed data together.

Where clause predicates that cannot serve as access predicate are good candidates for this technique:

SELECT first_name, last_name, subsidiary_id, phone_number
  FROM employees
 WHERE subsidiary_id = ?
   AND UPPER(last_name) LIKE '%INA%'

Remember that LIKE expressions with leading wildcards cannot use the index tree. That means that indexing LAST_NAME doesn’t narrow the scanned index range—no matter if you index LAST_NAME or UPPER(last_name). This condition is therefore no good candidate for indexing.

If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or join a training.

However the condition on SUBSIDIARY_ID is well suited for indexing. We don’t even need to add a new index because the SUBSIDIARY_ID is already the leading column in the index for the primary key.

Db2 (LUW)

Explain Plan
------------------------------------------------------------
ID | Operation               |                   Rows | Cost
 1 | RETURN                  |                        |   40
 2 |  FETCH EMPLOYEES        |    33 of 333 (  9.91%) |   40
 3 |   RIDSCN                |   333 of 333 (100.00%) |   12
 4 |    SORT (UNIQUE)        |   333 of 333 (100.00%) |   12
 5 |     IXSCAN EMPLOYEES_PK | 333 of 10000 (  3.33%) |   12

Predicate Information
 2 - SARG (Q1.SUBSIDIARY_ID = ?)
     SARG ( UPPER(Q1.LAST_NAME) LIKE '%INA%')
 5 - START (Q1.SUBSIDIARY_ID = ?)
      STOP (Q1.SUBSIDIARY_ID = ?)

Oracle

---------------------------------------------------------------
|Id | Operation                   | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT            |             |   17 |  230 |
|*1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |   17 |  230 |
|*2 |   INDEX RANGE SCAN          | EMPLOYEEs_PK|  333 |    2 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("LAST_NAME") LIKE '%INA%')
   2 - access("SUBSIDIARY_ID"=TO_NUMBER(:A))

In the above execution plan, the cost value raises a hundred times from the INDEX RANGE SCAN to the subsequent TABLE ACCESS BY INDEX ROWID operation. In other words: the table access causes the most work. It is actually a common pattern and is not a problem by itself. Nevertheless, it is the most significant contributor to the overall execution time of this query.

The table access is not necessarily a bottleneck if the accessed rows are stored in a single table block because the database can fetch all rows with a single read operation. If the same rows are spread across many different blocks, in contrast, the table access can become a serious performance problem because the database has to fetch many blocks in order to retrieve all the rows. That means the performance depends on the physical distribution of the accessed rows—in other words: it depends on the clustering of rows.

Note

The correlation between index order and table order is a performance benchmark—the so-called index clustering factor.

It is in fact possible to improve query performance by re-ordering the rows in the table so they correspond to the index order. This method is, however, rarely applicable because you can only store the table rows in one sequence. That means you can optimize the table for one index only. Even if you can choose a single index for which you would like to optimize the table, it is still a difficult task because most databases only offer rudimentary tools for this task. So-called row sequencing is, after all, a rather impractical approach.

The Index Clustering Factor

The index clustering factor is an indirect measure of the probability that two succeeding index entries refer to the same table block. The optimizer takes this probability into account when calculating the cost value of the TABLE ACCESS BY INDEX ROWID operation.

This is exactly where the second power of indexing—clustering data—comes in. You can add many columns to an index so that they are automatically stored in a well defined order. That makes an index a powerful yet simple tool for clustering data.

To apply this concept to the above query, we must extend the index to cover all columns from the where clause—even if they do not narrow the scanned index range:

CREATE INDEX empsubupnam ON employees
       (subsidiary_id, UPPER(last_name))

The column SUBSIDIARY_ID is the first index column so it can be used as an access predicate. The expression UPPER(last_name) covers the LIKE filter as index filter predicate. Indexing the uppercase representation saves a few CPU cycles during execution, but a straight index on LAST_NAME would work as well. You’ll find more about this in the next section.

Db2 (LUW)

Explain Plan
--------------------------------------------------------
ID | Operation             |                 Rows | Cost
 1 | RETURN                |                      |   15
 2 |  FETCH EMPLOYEES      |               0 of 0 |   15
 3 |   IXSCAN EMPSUBUPNAM2 | 0 of 10000 (   .00%) |   15

Predicate Information
 3 - START (Q1.SUBSIDIARY_ID = ?)
      STOP (Q1.SUBSIDIARY_ID = ?)
      SARG (Q1.LAST_NAME LIKE '%INA%')

To get the desired execution plan, I had to remove the UPPER from the index and the where-clause. As of Db2 (LUW) 10.5, function based indexing is a pretty fresh feature—it seems that the optimizer is not yet fully aware of it. Further, using appropriate collations to get case-insensitive behaviour is anyway the better option in Db2.

Oracle

--------------------------------------------------------------
|Id | Operation                   | Name       | Rows | Cost |
--------------------------------------------------------------
| 0 | SELECT STATEMENT            |            |   17 |   20 |
| 1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES  |   17 |   20 |
|*2 |   INDEX RANGE SCAN          | EMPSUBUPNAM|   17 |    3 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("SUBSIDIARY_ID"=TO_NUMBER(:A))
       filter(UPPER("LAST_NAME") LIKE '%INA%')

The new execution plan shows the very same operations as before. The cost value dropped considerably nonetheless. In the predicate information we can see that the LIKE filter is already applied during the INDEX RANGE SCAN. Rows that do not fulfill the LIKE filter are immediately discarded. The table access does not have any filter predicates anymore. That means it does not load rows that do not fulfill the where clause.

The difference between the two execution plans is clearly visible in the “Rows” column. According to the optimizer’s estimate, the query ultimately matches 17 records. The index scan in the first execution plan delivers 333 rows nevertheless. The database must then load these 333 rows from the table to apply the LIKE filter which reduces the result to 17 rows. In the second execution plan, the index access does not deliver those rows in the first place so the database needs to execute the TABLE ACCESS BY INDEX ROWID operation only 17 times.

You should also note that the cost value of the INDEX RANGE SCAN operation grew from two to three because the additional column makes the index bigger. In view of the performance gain, it is an acceptable compromise.

Warning

Don’t introduce a new index for the sole purpose of filter predicates. Extend an existing index instead and keep the maintenance effort low. With some databases you can even add columns to the index for the primary key that are not part of the primary key.

The following animation demonstrates the difference between the two execution plans:

Figure 5.1 Intentional Index Filter-Predicates

This trivial example seems to confirm the common wisdom to index every column from the where clause. This “wisdom”, however, ignores the relevance of the column order which determines what conditions can be used as access predicates and thus has a huge impact on performance. The decision about column order should therefore never be left to chance.

The index size grows with the number of columns as well—especially when adding text columns. Of course the performance does not get better for a bigger index even though the logarithmic scalability limits the impact considerably. You should by no means add all columns that are mentioned in the where clause to an index but instead only use index filter predicates intentionally to reduce the data volume during an earlier execution step.

Tip

Glossary: Index filter predicates
Index access and filter predicates explained by example
The impact of accidental index filter predicates demonstrated
Why “anywhere” LIKE searches aren’t access predicates
Spotting index filter predicates in Oracle, PostgreSQL and SQL Server execution plans.

Previous page Next page

You can’t learn everything in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up. Have a look at modern-⁠sql.com as well.

About the Author

Markus Winand provides insights into SQL and shows how different systems support it at modern-sql.com. Previously he made use-the-index-luke.com, which is still actively maintained. Markus can be hired as trainer, speaker and consultant via winand.at.

Index Filter Predicates Used Intentionally

If you like this page, you might also like …

Note

The Index Clustering Factor

Warning

Figure 5.1 Intentional Index Filter-Predicates

Tip

About the Author

The 3-Minute Quiz

Table of Contents

Latest Blog Posts

Geek Ware

Connect with Markus Winand

Index Filter Predicates Used Intentionally

If you like this page, you might also like …

Note

The Index Clustering Factor

Warning

Figure 5.1 Intentional Index Filter-Predicates

Tip

About the Author

Buy the Book

Hire Markus

The 3-Minute Quiz

Table of Contents

Latest Blog Posts

Geek Ware

Connect with Markus Winand