Very often index filter predicates indicate improper index usage caused by an incorrect column order in a concatenated index. Nevertheless index filter predicates can be used for a good reason as well—not to improve range scan performance but to group consecutively accessed data together.
Where
clause predicates that
cannot serve as access predicate are good candidates for this
technique:
SELECT first_name, last_name, subsidiary_id, phone_number
FROM employees
WHERE subsidiary_id = ?
AND UPPER(last_name) LIKE '%INA%'
Remember that LIKE
expressions with leading wildcards
cannot use the index tree. That means that
indexing LAST_NAME
doesn’t narrow the scanned index range—no
matter if you index LAST_NAME
or
UPPER(last_name)
. This condition is therefore no good
candidate for indexing.
On my Own Behalf
I offer training, tuning and consulting. Buying my book “SQL Performance Explained” (from €9.95) also supports my work on this website.
However the condition on SUBSIDIARY_ID
is well suited
for indexing. We don’t even need to add a new index because the
SUBSIDIARY_ID
is already the leading column in the index for
the primary key.
- DB2
Explain Plan ------------------------------------------------------------ ID | Operation | Rows | Cost 1 | RETURN | | 40 2 | FETCH EMPLOYEES | 33 of 333 ( 9.91%) | 40 3 | RIDSCN | 333 of 333 (100.00%) | 12 4 | SORT (UNIQUE) | 333 of 333 (100.00%) | 12 5 | IXSCAN EMPLOYEES_PK | 333 of 10000 ( 3.33%) | 12 Predicate Information 2 - SARG (Q1.SUBSIDIARY_ID = ?) SARG ( UPPER(Q1.LAST_NAME) LIKE '%INA%') 5 - START (Q1.SUBSIDIARY_ID = ?) STOP (Q1.SUBSIDIARY_ID = ?)
- Oracle
--------------------------------------------------------------- |Id | Operation | Name | Rows | Cost | --------------------------------------------------------------- | 0 | SELECT STATEMENT | | 17 | 230 | |*1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 17 | 230 | |*2 | INDEX RANGE SCAN | EMPLOYEEs_PK| 333 | 2 | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(UPPER("LAST_NAME") LIKE '%INA%') 2 - access("SUBSIDIARY_ID"=TO_NUMBER(:A))
In the above execution plan, the cost value raises a hundred times
from the INDEX RANGE SCAN
to the subsequent TABLE
ACCESS BY INDEX ROWID
operation. In other words: the table access
causes the most work. It is actually a common pattern and is not a problem
by itself. Nevertheless, it is the most significant contributor to the
overall execution time of this query.
The table access is not necessarily a bottleneck if the accessed rows are stored in a single table block because the database can fetch all rows with a single read operation. If the same rows are spread across many different blocks, in contrast, the table access can become a serious performance problem because the database has to fetch many blocks in order to retrieve all the rows. That means the performance depends on the physical distribution of the accessed rows—in other words: it depends on the clustering of rows.
Note
The correlation between index order and table order is a performance benchmark—the so-called index clustering factor.
It is in fact possible to improve query performance by re-ordering the rows in the table so they correspond to the index order. This method is, however, rarely applicable because you can only store the table rows in one sequence. That means you can optimize the table for one index only. Even if you can choose a single index for which you would like to optimize the table, it is still a difficult task because most databases only offer rudimentary tools for this task. So-called row sequencing is, after all, a rather impractical approach.
This is exactly where the second power of indexing—clustering data—comes in. You can add many columns to an index so that they are automatically stored in a well defined order. That makes an index a powerful yet simple tool for clustering data.
To apply this concept to the above query, we must extend the index
to cover all columns from the where
clause—even if they do not narrow the scanned index range:
CREATE INDEX empsubupnam ON employees
(subsidiary_id, UPPER(last_name))
The column SUBSIDIARY_ID
is the first index column so
it can be used as an access predicate. The expression
UPPER(last_name)
covers the LIKE
filter as
index filter predicate. Indexing the uppercase
representation saves a few CPU cycles during execution, but a straight
index on LAST_NAME
would work as well. You’ll find more about
this in the next section.
- DB2
Explain Plan -------------------------------------------------------- ID | Operation | Rows | Cost 1 | RETURN | | 15 2 | FETCH EMPLOYEES | 0 of 0 | 15 3 | IXSCAN EMPSUBUPNAM2 | 0 of 10000 ( .00%) | 15 Predicate Information 3 - START (Q1.SUBSIDIARY_ID = ?) STOP (Q1.SUBSIDIARY_ID = ?) SARG (Q1.LAST_NAME LIKE '%INA%')
To get the desired execution plan, I had to remove the
UPPER
from the index and thewhere
-clause. As of DB2 LUW 10.5, function based indexing is a pretty fresh feature—it seems that the optimizer is not yet fully aware of it. Further, using appropriate collations to get case-insensitive behaviour is anyway the better option in DB2.- Oracle
-------------------------------------------------------------- |Id | Operation | Name | Rows | Cost | -------------------------------------------------------------- | 0 | SELECT STATEMENT | | 17 | 20 | | 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 17 | 20 | |*2 | INDEX RANGE SCAN | EMPSUBUPNAM| 17 | 3 | -------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("SUBSIDIARY_ID"=TO_NUMBER(:A)) filter(UPPER("LAST_NAME") LIKE '%INA%')
The new execution plan shows the very same operations as before. The
cost value dropped considerably nonetheless. In the predicate information
we can see that the LIKE
filter is already applied during the
INDEX RANGE SCAN
. Rows that do not fulfill the
LIKE
filter are immediately discarded. The table access does
not have any filter predicates anymore. That means it does not load rows
that do not fulfill the where
clause.
The difference between the two execution plans is clearly visible in
the “Rows” column. According to the optimizer’s estimate, the query
ultimately matches 17 records. The index scan in the first execution plan
delivers 333 rows nevertheless. The database must then load these 333 rows
from the table to apply the LIKE
filter which reduces the
result to 17 rows. In the second execution plan, the index access does not
deliver those rows in the first place so the database needs to execute the
TABLE ACCESS BY INDEX ROWID
operation only 17 times.
You should also note that the cost value of the INDEX RANGE
SCAN
operation grew from two to three because the additional column
makes the index bigger. In view of the performance gain, it is an
acceptable compromise.
Warning
Don’t introduce a new index for the sole purpose of filter predicates. Extend an existing index instead and keep the maintenance effort low. With some databases you can even add columns to the index for the primary key that are not part of the primary key.
The following animation demonstrates the difference between the two execution plans:
Figure 5.1 Intentional Index Filter-Predicates

This trivial example seems to confirm the common wisdom to index
every column from the where
clause.
This “wisdom”, however, ignores the relevance of the column order which
determines what conditions can be used as access predicates and thus has a
huge impact on performance. The decision about column order should
therefore never be left to chance.
The index size grows with the number of columns as well—especially
when adding text columns. Of course the performance does not get better
for a bigger index even though the logarithmic
scalability limits the impact considerably. You should by no means
add all columns that are mentioned in the where
clause to an index but instead only use
index filter predicates intentionally to reduce the data volume during an
earlier execution step.
Tip
Glossary: Index filter predicates
The impact of accidental index filter predicates demonstrated
Spotting index filter predicates in Oracle, PostgreSQL and SQL Server execution plans.