- August 12–22: Online Training (EU shift)
- Enrollment
- FOSDEM Impressions
- June 8–18: Online Training (US shift)
- Non-monetary micro sponsoring
- November 11-12 in Frankfurt am Main
- Online-Training in July and August
- Oracle + PostgreSQL
- Party time
- PostgreSQL Performance Event
- SQL Server Performance Kurs in Stuttgart
- SQL Server performance training in London
- Shipping Terms
- The two top performance problems caused by ORM tools
- Top Tweets January 2013
- Training Survey
- Training and Conference Dates
- Use The Index, Luke
- Ask
- Consulting
2012-03-13Case-Insensitive Search Using UPPER or LOWER
Applies to
MySQLNo
OracleYes
PostgreSQLYes
SQL ServerYes
Ignoring the case in a where clause is very simple. You can, for example, convert both sides of the comparison to all caps notation:
SELECT first_name, last_name, phone_number
FROM employees
WHERE UPPER(last_name) = UPPER('winand');
Regardless of the capitalization used for the search term or the LAST_NAME column, the UPPER function makes them match as desired.
Note
Another way for case-insensitive matching is to use a different “collation”. The default collations used by SQL Server and MySQL do not distinguish between upper and lower case letters—they are case-insensitive by default.
The logic of this query is perfectly reasonable but the execution plan is not:
---------------------------------------------------- | Id | Operation | Name | Rows | Cost | ---------------------------------------------------- | 0 | SELECT STATEMENT | | 10 | 477 | |* 1 | TABLE ACCESS FULL| EMPLOYEES | 10 | 477 | ---------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(UPPER("LAST_NAME")='WINAND')
It is a return of our old friend the full table scan. Although there is an index on LAST_NAME, it is unusable—because the search is not on LAST_NAME but on UPPER(LAST_NAME). From the database’s perspective, that’s something entirely different.
Show some Love!
Buying the PDF supports our work on this site.
And you'll get a beautiful e-book.
And you'll get a beautiful e-book.
This is a trap we all might fall into. We recognize the relation between LAST_NAME and UPPER(LAST_NAME) instantly and expect the database to “see” it as well. In reality the optimizer’s view is more like this:
SELECT first_name, last_name, phone_number
FROM employees
WHERE BLACKBOX(...) = 'WINAND';
The UPPER function is just a black box. The parameters to the function are not relevant because there is no general relationship between the function’s parameters and the result.
Tweet this tip
Tip
Replace the function name with BLACKBOX to understand the optimizer’s point of view.
To support that query, we need an index that covers the actual search term. That means we do not need an index on LAST_NAME but on UPPER(LAST_NAME):
CREATE INDEX emp_up_name
ON employees (UPPER(last_name));
An index whose definition contains functions or expressions is a so-called function-based index (FBI). Instead of copying the column data directly into the index, a function-based index applies the function first and puts the result into the index. As a result, the index stores the names in all caps notation.
The database can use a function-based index if the exact expression of the index definition appears in an SQL statement—like in the example above. The execution plan confirms this:
-------------------------------------------------------------- |Id |Operation | Name | Rows | Cost | -------------------------------------------------------------- | 0 |SELECT STATEMENT | | 100 | 41 | | 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 100 | 41 | |*2 | INDEX RANGE SCAN | EMP_UP_NAME | 40 | 1 | -------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(UPPER("LAST_NAME")='WINAND')
It is a regular INDEX RANGE SCAN as described in Chapter 1. The database traverses the B-tree and follows the leaf node chain. There are no dedicated operations or keywords for function-based indexes.
Warning
Sometimes ORM tools use UPPER and LOWER without the developer’s knowledge. Hibernate, for example, injects an implicit LOWER for case-insensitive searches.
The execution plan is not yet the same as it was in the previous section without UPPER; the row count estimate is too high. It is particularly strange that the optimizer expects to fetch more rows from the table than the INDEX RANGE SCAN delivers in the first place. How can it fetch 100 rows from the table if the preceding index scan returned only 40 rows? The answer is that it can not. Contradicting estimates like this often indicate problems with the statistics. In this particular case it is because the Oracle database does not update the table statistics when creating a new index (see also “Oracle Statistics for Function-Based Indexes”).
After updating the statistics, the optimizer calculates more accurate estimates:
--------------------------------------------------------------
|Id |Operation | Name | Rows | Cost |
--------------------------------------------------------------
| 0 |SELECT STATEMENT | | 1 | 3 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 3 |
|*2 | INDEX RANGE SCAN | EMP_UP_NAME | 1 | 1 |
--------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(UPPER("LAST_NAME")='WINAND')
Note
Statistics for function-based indexes and multi-column statistics were introduced with Oracle release 11g.
Although the updated statistics do not improve execution performance in this case—the index was properly used anyway—it is always a good idea to check the optimizer’s estimates. The number of rows processed for each operation (cardinality estimate) is a particularly important figure that is also shown in SQL Server and PostgreSQL execution plans.
Tip
Appendix A, “Execution Plans”, describes the row count estimates in SQL Server and PostgreSQL execution plans.
SQL Server does not support function-based indexes as described but it does offer computed columns that can be used instead. To make use of this, you have to first add a computed column to the table that can be indexed afterwards:
ALTER TABLE employees ADD last_name_up AS UPPER(last_name); CREATE INDEX emp_up_name ON employees (last_name_up);
SQL Server is able to use this index whenever the indexed expression appears in the statement. You do not need to rewrite your query to use the computed column.
Stay connected:
RSS Feed
Like on Facebook
Follow me on Twitter
Share at Google+
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook