2011-07-16Smart Logic
One of the key features of SQL databases is their support for ad-hoc queries—that is, dynamic SQL like conditional where clauses. That’s possible because the query optimizer (query planner) works at runtime; each SQL statement is analyzed when received to generate a reasonable execution plan. The overhead introduced by runtime optimization can be minimized with bind parameters—Section 3 covers that in detail.
The gist of that recap is that databases are optimized for dynamic SQL—so, use it.
There is, however, one widely used practice to avoid dynamic SQL in favor of static SQL—often because of the “Dynamic SQL is slow” myth. That practice is doing more harm than good if the database uses a shared execution plan cache like DB2, Oracle, and SQL Server.
Imagine an application that queries the employees table (see Appendix C, “Example Schema”). The application allows searching for subsidiary id, employee id and last name (case-insensitive) in any combination. It is possible to write a single SQL statement that supports all possible variations using a “smart” logic to implement conditional where clauses:
SELECT first_name, last_name, subsidiary_id, employee_id FROM employees WHERE ( subsidiary_id = :sub_id OR :sub_id IS NULL ) AND ( employee_id = :emp_id OR :emp_id IS NULL ) AND ( UPPER(last_name) = :name OR :name IS NULL )
The statement uses named bind variables for better readability. All possible filter expressions are statically coded in the statement. Whenever a filter isn’t needed, the respective search term is NULL so that the particular filter becomes ineffective.
It’s a perfectly reasonable SQL statement. The use of NULL is even in line with its definition according to the three valued logic of SQL. Regardless of that, it is still amongst the worst performance anti-patterns.
The databases can’t optimize for any particular filter because any of them could be canceled out. The execution plan needs to cover the worst case—that is that all filters are switched off:
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 478 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 2 | 478 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((:NAME IS NULL OR UPPER("LAST_NAME")=:NAME)
AND (:EMP_ID IS NULL OR "EMPLOYEE_ID"=:EMP_ID)
AND (:SUB_ID IS NULL OR "SUBSIDIARY_ID"=:SUB_ID))
The database has to run a full table scan even if indexes exist on every individual column.
Remember that one use of bind parameters is to decrease parsing overhead—as explained in Section 3. The database will not re-evaluate the plan when the statement is executed with actual bind parameters. The prepared query plan has to cover all cases.
It’s not that the database is “stupid”. Many optimizers resolves the (tautological) expressions when inlined literally:
SELECT first_name, last_name, subsidiary_id, employee_id
FROM employees
WHERE( subsidiary_id = NULL OR NULL IS NULL )
AND( employee_id = NULL OR NULL IS NULL )
AND( UPPER(last_name) = 'WINAND' OR 'WINAND' IS NULL )
---------------------------------------------------------------
|Id | Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 2 |
|*2 | INDEX RANGE SCAN | EMP_UP_NAME | 1 | 1 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(UPPER("LAST_NAME")='WINAND')
This is, however, no solution. It is just the proof the database can resolve these conditions.
Warning
Using literal values in SQL exposes your application to SQL injection and causes performance problems due to parsing overhead.
The solution is to build the SQL statement dynamically. Tell the database just what you want right now—nothing more.
SELECT first_name, last_name, subsidiary_id, employee_id FROM employees WHERE UPPER(last_name) = :name
The bind parameter prevents SQL injection and improves execution plan caching.
Tip
Use dynamic SQL if you need dynamic where clauses.
Still use bind parameters when generating dynamic SQL—otherwise, the “Dynamic SQL is Slow” myth comes true.
The problem described in this section is widespread. All databases that use a shared execution plan cache invented some features to cope with the problem. Often introducing new problems and bugs.
- DB2
DB2 uses a shared execution plan cache and is fully exposed to the problem described in this section.
DB2 allows to specify the re-optimization approach using the
REOPThint. The default isNONE, which produces a generic execution plan and suffers from the problem described above.REOPT(ALWAYS)will tell the optimizer to always peek the actual bind variables to produce the best plan for each execution. That is effectively turning off execution plan caching for that statement.The last option is
REOPT(ONCE)which will peek the bind parameters for the first execution only. The problem with that approach is its nondeterministic behavior: whatever value was used in the first execution (e.g., since database startup) affects all executions. The execution plan can change every time the database is restarted or, more problematic, the cached plan expires. In other words; the execution plan can change at any time.- MySQL
MySQL doesn’t suffer from this particular problem at all, because it has no execution plan cache. A feature request from 2009 discusses the impact of execution plan caching. It seems that MySQL’s optimizer is simple enough so that execution plan caching doesn’t pay off.
- Oracle
Oracle uses a shared execution plan cache (SQL area) and is fully exposed to the problem described in this section.
Oracle introduced the so-called bind peeking with release 9i. Bind peeking enables the optimizer to use the actual bind values of the first execution during parsing. The problem with that approach is its nondeterministic behavior: whatever value was used in the first execution (e.g., since database startup) affects all executions. The execution plan can change every time the database is restarted or, more problematic, the cached plan expires. In other words; the execution plan can change at any time.
Release 11g introduced adaptive cursor sharing to cope with the problem. This feature enables the database to have multiple execution plans for the same SQL statement. The “adaptive” approach is to run everything as usual, but to take note of the time each execution takes. In case one execution runs much slower than the others, the optimizer will create a tailor-made plan for the specific bind values. However, that tailor-made plan is created the next time the statement executes with the problematic bind values. That means that the first execution must run slow before the second execution can benefit.
- PostgreSQL
PostgreSQL caches query plans only as long as the prepared statement is kept open. That means that the problem described in this section applies only if the same statement handle is re-used. Note that the default setting for JDBC’s
PreparedStatementis not to re-use the query plan until its fifth execution. See also: Planning with Actual Bind Values.- SQL Server
SQL Server uses the so-called parameter sniffing at least since release 2000. Parameter sniffing enables the optimizer to use the actual bind values of the first execution during parsing. The problem with that approach is its nondeterministic behavior: whatever value was used in the first execution (e.g., since database startup) affects all executions. The execution plan can change every time the database is restarted or, more problematic, the cached plan expires and will be recompiled the next time. In other words; the execution plan can change at any time.
SQL Server 2005 added new query hits to gain more control over parameter sniffing and recompiling. The query hint
RECOMPILEdisables plan caching for a selected statement.OPTIMIZE FORallows to specify actual parameter values that are used for optimization only. Finally,USE PLANallows to specify which plan an execution should actually use.With SQL Server 2008, it was realized and corrected that the
OPTION(RECOMPILE)did not use all bind values. The new implementation was buggy, however, so that things become a little bit more complicated.Erland Sommarskog collected all the information, for all versions of SQL Server.
Although the advanced techniques used by DB2, the Oracle database and SQL Server may improve the “smart logic” optimization to a certain extent, they actually aim to utilize column histograms.
Removing any unneeded parts from the SQL statement is still the most reliable method to get optimal performance.

share and subscribe
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook