Indexing LIKE Filters


The SQL LIKE-Operator causes unexpected performance very often because the actual search term might prevent efficient index usage. That means, some search terms can be indexed very well, others not. The wildcard’s position makes the difference.

The following example uses the %-wildcard in the middle of the search term:

SELECT first_name, last_name, date_of_birth
  FROM employees
 WHERE UPPER(last_name) LIKE 'WIN%D'

---------------------------------------------------------------
|Id | Operation                   | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT            |             |    1 |    4 |
| 1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |    1 |    4 |
|*2 |   INDEX RANGE SCAN          | EMP_UP_NAME |    1 |    2 |
---------------------------------------------------------------

Only the part before the wildcard can be used during tree traversal. The remaining characters are just filter predicates, which do not narrow the scanned index range. A single LIKE-Expression can therefore contain two predicates types: (1) the part before the first wildcard as access predicate; (2) the other characters as filter predicate.

Caution

For the PostgreSQL database, you might need to specify an operator class (e.g., varchar_pattern_ops) to use LIKE expressions as access predicate. Refer to “Operator Classes and Operator Families” in the PostgreSQL documentation for further details.

The more selective the part before the first wildcard is, the smaller the scanned index range becomes. That, in turn, makes the index lookup more efficient. Figure 2.4 illustrates this relationship with three different LIKE expressions. All three select the same row. But the scanned index range is very different.

Figure 2.4. Various LIKE searches


The first expression has two characters before the wildcard. They limits the scanned index range to 18 rows. Only one of them matches the entire LIKE expression—the other 17 are discarded. The second expression has a longer prefix, which narrows the scanned index range to two rows. With this expression, the database just reads one extra row that is not relevant for the result. The last expression does not have a filter predicate at all. The database just reads the entry that matches the entire LIKE expression.

Important

Only the part before the first wildcard serves as access predicate.

The remaining characters do not narrow the scanned index range—they just discard non-matching results.

The opposite case is also possible: a LIKE expression that starts with a wildcard. Such a LIKE expression cannot serve as access predicate. The database has to scan the entire table, if the where clause does not provide another access path.

Tweet this tip

Tip

Avoid LIKE expressions with leading wildcards (e.g., '%TERM').

The wildcard’s position in the search term affects index usage—in theory, at least. In reality, the optimizer creates a generic execution plan when using bind parameters for the search term. In that case, the optimizer has to guess how the query will be executed in most cases: with or without leading wildcard?

Coaching by the Author
Faster, easier and more memorable than reading.

Most database just assume that there is no leading wildcard when optimizing a LIKE condition with bind parameter. But that assumption is wrong, if it is used for a full text search. There is, unfortunately, no direct way to tag a LIKE condition as full text search. The box “Labeling Full-Text LIKE-Expressions” shows a logical attempt that does not work. The most obvious solution is to specify the LIKE expression without bind parameter. But that increases the optimization overhead and opens an SQL injection vulnerability. An effective, but still secure and portable solution is to intentionally obfuscate the LIKE condition. “Combining Columns” explains this obfuscation in detail.

Labeling Full-Text LIKE-Expressions

For a full text search, one could specify the wildcards separately from the actual search term:

WHERE text_column LIKE '%' || ? || '%'

The wildcards are directly written into the SQL statement, but the search term is provided as bind parameter. The final LIKE expression is built by the database itself using the string concatenation operator || (Oracle, PostgreSQL). Although using a bind parameter, the final LIKE expression will always start with a wildcard. However, database do not recognize that.

For the PostgreSQL database, the problem is different because PostgreSQL assumes there is a leading wildcard when using a bind parameter for a LIKE expression. So, it just does not use an index in that case. The only way to get an index access for a LIKE expression is to provide the actual search term to the optimizer. But if you are not using a bind parameter and put the search term directly into the SQL statement, you must take other precautions against SQL injections.

Even if the execution plan is constructed for a leading wildcard, it can still deliver insufficient performance. But you can still use another part of the where clause to access the data efficiently—see also “Index Filter-Predicates Intentionally Used”. If that is not other access path, you might use one of the following, proprietary full text indexes.

DB2

DB2 supports the contains keyword. See “DB2 Text Search tutorial“ at IBM developerWorks.

MySQL

MySQL implements the match and against keywords. Starting with MySQL 5.6, they can index InnoDB tables as well—previously, they could only index MyISAM tables. See “Full-Text Search Functions” in the MySQL documentation.

Oracle

The Oracle database supports the contains keyword. See “Oracle Text Application Developer’s Guide.”

PostgreSQL

PostgreSQL uses the @@ operator to implement full text searches. See “Full Text Seach” in the PostgreSQL documentation.

Another option is using the WildSpeed extension. This is basically storing the string in all possible rotations, so that each character of the string is at the beginning. That means, that the indexed string is not stored once, but as many times as there are characters in the string—so, it needs quite some space. However, that makes it possible to effectively query the index without a wildcard prefix.

SQL Server

SQL Server implements the contains keyword. Siehe “Full-Text Seach” in the SQL Server documentation.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql