Case-Insensitive Search using UPPER or LOWER


Applies to
MySQLNo
OracleYes
PostgreSQLYes
SQL ServerYes

The SQL for a case-insensitive search is very simple. You can, for example, convert both sides of the comparison to all caps notation during statement execution:

SELECT first_name, last_name, phone_number
  FROM employees
 WHERE UPPER(last_name) = UPPER('winand');

Regardless of the capitalization used for the search term or the LAST_NAME column, the UPPER function makes them match as desired.

Note

Collations are another way to implement case-insensitive searches. The SQL Server and MySQL default collations are case-insensitive.

The logic of this query is perfectly reasonable, but the execution plan is not:

----------------------------------------------------
| Id | Operation         | Name      | Rows | Cost |
----------------------------------------------------
|  0 | SELECT STATEMENT  |           |   10 |  477 |
|* 1 |  TABLE ACCESS FULL| EMPLOYEES |   10 |  477 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(UPPER("LAST_NAME")='WINAND')

It is a comeback of our old friend the full table scan. Although there is an index on LAST_NAME it is unusable because the search is not on LAST_NAME—it’s on UPPER(LAST_NAME). From the database’s perspective, that’s something entirely different.

It's a book!
You are just reading a book. Here is the table of content

It is a trap we all fall into. We instantly recognize the relation between LAST_NAME and UPPER(LAST_NAME) and expect the database to “see” it as well. In fact, the optimizer’s picture is more like that:

SELECT first_name, last_name, phone_number
  FROM employees
 WHERE BLACKBOX(...) = 'WINAND';

The UPPER function is just a black box. The parameters to the function are not relevant because there is no general relationship between the function’s parameters and the result.

Tweet this tip

Tip

Replace the function name with BLACKBOX to understand the optimizer’s point of view.

Compile Time Evaluation

The optimizer evaluates the expression on the right hand side during “compile time” because all parameters are known. That is also visible in the Oracle execution plan (Predicate Information) because it shows the upper case notation of the search term only. It is very similar to a compiler that evaluates constant expressions at compile time.

To support that query, we need an index on the actual search expression; that is, a so-called function based index. Although the name suggests a special index type, it is just an ordinary B-Tree index. The only difference is that it does not use the table data directly, but uses the result from the function.

CREATE INDEX emp_up_name 
    ON employees (UPPER(last_name));

The create statement for a function based index is very similar to a regular index—there is no special keyword. Instead of a column name, you just provide an expression.

The index stores the all capitalized notation of the LAST_NAME column. It can be shown like described in the tip on index visualization (Oracle syntax):

SELECT * FROM (
  SELECT UPPER(last_name)
    FROM employees
   ORDER BY UPPER(last_name)
) 
 WHERE ROWNUM < 10;
UPPER(LAST_NAME)
--------------------
AAACH
AAAXPPKU
AABCZLTSCNM
AAGJX
AAIIARN
AAQVASLR
AASQD
AAUMEJHQUEI
ABAIHSJFYG
ABAS

The database can use a function based index if the exact expression of the index definition appears in an SQL statement—like in the example above. The execution plan confirms the index usage:

--------------------------------------------------------------
|Id |Operation                   | Name        | Rows | Cost |
--------------------------------------------------------------
| 0 |SELECT STATEMENT            |             |  100 |   41 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |  100 |   41 |
|*2 |  INDEX RANGE SCAN          | EMP_UP_NAME |   40 |    1 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  2 - access(UPPER("LAST_NAME")='WINAND')

It is a regular INDEX RANGE SCAN, exactly as described in Chapter 1, “Anatomy of an Index”; the tree is traversed and the leaf nodes are followed. There are no “special” operations for function based indexes just like there is no special keyword to create a function based index.

Warning

Sometimes, ORM-Tools use UPPER and LOWER without developer’s knowledge. Hibernate, for example, injects an implicit LOWER to implement case-insensitive searches.

But the execution plan is still not the same as in the previous section because the row estimates are too high. It is striking that the number of rows processed by the table access is even higher than the number of rows expected from the INDEX RANGE SCAN. How can the table access match 100 records if the preceding index scan returned only 40 rows? Well, it can’t. Contradicting estimates like that often indicate problems with the statistics. In that particular case it is because the Oracle database does not update the table statistics when creating a new index.

Oracle Statistics for Function-Based Indexes

The column statistics, which include the number of distinct column values, are part of the table statistics. That allows to use the same statistics if a column is part of multiple indexes.

Statistics for a function based index (FBI) are also kept on table level as virtual columns. Although the Oracle database collects the index statistics for the new index automatically (since release 10g), it does not update the table statistics. Therefore, the documentation recommends to update the table statistics after creating a function based index:

After creating a function-based index, collect statistics on both the index and its base table using the DBMS_STATS package. Such statistics will enable Oracle Database to correctly decide when to use the index.

My personal recommendation goes even further: update the statistics for the table and all indexes on that table. But updating all statistics may also lead to unexpected behaviour. Your should therefore coordinate that activity with your DBAs and take a backup of the original statistics before.

After updating the statistics, the execution plan has better estimates:

--------------------------------------------------------------
|Id |Operation                   | Name        | Rows | Cost |
--------------------------------------------------------------
| 0 |SELECT STATEMENT            |             |    1 |    3 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |    1 |    3 |
|*2 |  INDEX RANGE SCAN          | EMP_UP_NAME |    1 |    1 |
--------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  2 - access(UPPER("LAST_NAME")='WINAND')

Note

Statistics for function based indexes and multi-column statistics were introduced with Oracle release 11g.

Although the execution performance is not improved by the updated statistics—the index was properly used anyway—it is always good to have a look at the optimizer’s estimates. The number of rows processed for each operation (cardinality estimate) is a particularly important figure, which is also shown in SQL Server and PostgreSQL execution plans.

Tip

Appendix A, “Execution Plans” shows the row count estimates in SQL Server and PostgreSQL execution plans.

SQL Server does not support function based indexes as such, but you can index computed columns to gain the very same effect. For that, you have to add a computed column to the table first, and can add the index on that column afterwards:

ALTER TABLE employees ADD last_name_up AS UPPER(last_name);
CREATE INDEX emp_up_name ON employees (last_name_up);

SQL Server is able to use this index, whenever the indexed expression appears in the statement. You do not need to rewrite your query to use the computed column.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql