Indexing LIKE Filters


The SQL LIKE operator very often causes unexpected performance behavior because some search terms prevent efficient index usage. That means that there are search terms that can be indexed very well, but others can not. It is the position of the wildcard characters that makes all the difference.

The following example uses the % wildcard in the middle of the search term:

SELECT first_name, last_name, date_of_birth
  FROM employees
 WHERE UPPER(last_name) LIKE 'WIN%D'
DB2
Explain Plan
----------------------------------------------------
ID | Operation         |                 Rows | Cost
 1 | RETURN            |                      |   13
 2 |  FETCH EMPLOYEES  |     1 of 1 (100.00%) |   13
 3 |   IXSCAN EMP_NAME | 1 of 10000 (   .01%) |    6

Predicate Information
 3 - START ('WIN....................................
      STOP (Q1.LAST_NAME <= 'WIN....................
      SARG (Q1.LAST_NAME LIKE 'WIN%D')

For this example, the query was changed to read WHERE last_name LIKE 'WIN%D' (no UPPER). It seems like DB2 LUW 10.5 cannot use an access predicates from LIKE on a function-based index (does a full index scan at best).

Otherwise, DB2 shines here: it clearly shows the START and STOP conditions, which consist of the part before the first wildcard, but also shows that the full pattern is applied as filter predicate.

MySQL
Try online at SQL Fiddle+----+-----------+-------+----------+---------+------+-------------+
| id | table     | type  | key      | key_len | rows | Extra       |
+----+-----------+-------+----------+---------+------+-------------+
|  1 | employees | range | emp_name | 767     |    2 | Using where |
+----+-----------+-------+----------+---------+------+-------------+
Oracle
Try online at SQL Fiddle---------------------------------------------------------------
|Id | Operation                   | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT            |             |    1 |    4 |
| 1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |    1 |    4 |
|*2 |   INDEX RANGE SCAN          | EMP_UP_NAME |    1 |    2 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(UPPER("LAST_NAME") LIKE 'WIN%D')
       filter(UPPER("LAST_NAME") LIKE 'WIN%D')
PostgreSQL
Try online at SQL Fiddle                       QUERY PLAN
----------------------------------------------------------
Index Scan using emp_up_name on employees
   (cost=0.01..8.29 rows=1 width=17)
   Index Cond: (upper((last_name)::text) ~>=~ 'WIN'::text)
           AND (upper((last_name)::text) ~<~  'WIO'::text)
       Filter: (upper((last_name)::text) ~~ 'WIN%D'::text)

LIKE filters can only use the characters before the first wildcard during tree traversal. The remaining characters are just filter predicates that do not narrow the scanned index range. A single LIKE expression can therefore contain two predicate types: (1) the part before the first wildcard as an access predicate; (2) the other characters as a filter predicate.

Caution

For the PostgreSQL database, you might need to specify an operator class (e.g., varchar_pattern_ops) to use LIKE expressions as access predicates. Refer to “Operator Classes and Operator Families” in the PostgreSQL documentation for further details.

The more selective the prefix before the first wildcard is, the smaller the scanned index range becomes. That, in turn, makes the index lookup faster. Figure 2.4 illustrates this relationship using three different LIKE expressions. All three select the same row, but the scanned index range—and thus the performance—is very different.

Figure 2.4. Various LIKE Searches


The first expression has two characters before the wildcard. They limit the scanned index range to 18 rows. Only one of them matches the entire LIKE expression—the other 17 are fetched but discarded. The second expression has a longer prefix that narrows the scanned index range down to two rows. With this expression, the database just reads one extra row that is not relevant for the result. The last expression does not have a filter predicate at all: the database just reads the entry that matches the entire LIKE expression.

Important

Only the part before the first wildcard serves as an access predicate.

The remaining characters do not narrow the scanned index range—non-matching entries are just left out of the result.

The opposite case is also possible: a LIKE expression that starts with a wildcard. Such a LIKE expression cannot serve as an access predicate. The database has to scan the entire table if there are no other conditions that provide access predicates.

Tweet this tip

Tip

Avoid LIKE expressions with leading wildcards (e.g., '%TERM').

The position of the wildcard characters affects index usage—at least in theory. In reality the optimizer creates a generic execution plan when the search term is supplied via bind parameters. In that case, the optimizer has to guess whether or not the majority of executions will have a leading wildcard.

About our book “SQL Performance Explained”
This book is definitively worth having in the company library.
” — Joe Celko

Most databases just assume that there is no leading wildcard when optimizing a LIKE condition with bind parameter, but this assumption is wrong if the LIKE expression is used for a full-text search. There is, unfortunately, no direct way to tag a LIKE condition as full-text search. The box “Labeling Full-Text LIKE Expressions” shows an attempt that does not work. Specifying the search term without bind parameter is the most obvious solution, but that increases the optimization overhead and opens an SQL injection vulnerability. An effective but still secure and portable solution is to intentionally obfuscate the LIKE condition. “Combining Columns” explains this in detail.

Labeling Full-Text LIKE Expressions

When using the LIKE operator for a full-text search, we could separate the wildcards from the search term:

WHERE text_column LIKE '%' || ? || '%'

The wildcards are directly written into the SQL statement, but we use a bind parameter for the search term. The final LIKE expression is built by the database itself using the string concatenation operator || (Oracle, PostgreSQL). Although using a bind parameter, the final LIKE expression will always start with a wildcard. Unfortunately databases do not recognize that.

For the PostgreSQL database, the problem is different because PostgreSQL assumes there is a leading wildcard when using bind parameters for a LIKE expression. PostgreSQL just does not use an index in that case. The only way to get an index access for a LIKE expression is to make the actual search term visible to the optimizer. If you do not use a bind parameter but put the search term directly into the SQL statement, you must take other precautions against SQL injection attacks!

Even if the database optimizes the execution plan for a leading wildcard, it can still deliver insufficient performance. You can use another part of the where clause to access the data efficiently in that case—see also “Index Filter Predicates Used Intentionally”. If there is no other access path, you might use one of the following proprietary full-text index solutions.

DB2

DB2 supports the contains keyword. See “DB2 Text Search tutorial“ at IBM developerWorks.

MySQL

MySQL offers the match and against keywords for full-text searching. Starting with MySQL 5.6, you can create full-text indexes for InnoDB tables as well—previously, this was only possible with MyISAM tables. See “Full-Text Search Functions” in the MySQL documentation.

Oracle

The Oracle database offers the contains keyword. See the “Oracle Text Application Developer’s Guide.”

PostgreSQL

PostgreSQL offers the @@ operator to implement full-text searches. See “Full Text Search” in the PostgreSQL documentation.

Another option is to use the WildSpeed extension to optimize LIKE expressions directly. The extension stores the text in all possible rotations so that each character is at the beginning once. That means that the indexed text is not only stored once but instead as many times as there are characters in the string—thus it needs a lot of space.

SQL Server

SQL Server offers the contains keyword. See “Full-Text Search” in the SQL Server documentation.

Think About It

How can you index a LIKE search that has only one wildcard at the beginning of the search term ('%TERM')?

If you like my way of explaining things, you’ll love my book.

About the Author

Photo of Markus Winand
Markus Winand tunes developers for high SQL performance. He also published the book SQL Performance Explained and offers in-house training as well as remote coaching at http://winand.at/

?Recent questions at
Ask.Use-The-Index-Luke.com

0
votes
0
answers
409
views

Join with inequalities only

Dec 16 at 12:06 Markus Winand ♦♦ 936
inequality join
0
votes
1
answer
372
views

PostgreSQL Scripts: Performance Testing and Scalability problem and question

Nov 12 at 14:53 Markus Winand ♦♦ 936
testing postgresql scalability
0
votes
1
answer
1.0k
views

PostgreSQL Bitmap Heap Scan on index is very slow but Index Only Scan is fast

Oct 31 at 11:31 Markus Winand ♦♦ 936
index postgresql postgres sql