Smart Logic


One of the key features of SQL databases is their support for ad-hoc queries—that is, dynamic SQL like conditional where clauses. That’s possible because the query optimizer (query planner) works at runtime; each SQL statement is analyzed when received to generate a reasonable execution plan. The overhead introduced by runtime optimization can be minimized with bind parameters—Section 3 covers that in detail.

The gist of that recap is that databases are optimized for dynamic SQL—so, use it.

Quick answers instead of long searches!
Instant Coaching is the online consulting service by the author of this page.

There is, however, one widely used practice to avoid dynamic SQL in favor of static SQL—often because of the “Dynamic SQL is slow” myth. That practice is doing more harm than good if the database uses a shared execution plan cache like DB2, Oracle, and SQL Server.

Imagine an application that queries the employees table (see Appendix C, “Example Schema”). The application allows searching for subsidiary id, employee id and last name (case-insensitive) in any combination. It is possible to write a single SQL statement that supports all possible variations using a “smart” logic to implement conditional where clauses:

SELECT first_name, last_name, subsidiary_id, employee_id
  FROM employees
 WHERE ( subsidiary_id    = :sub_id OR :sub_id IS NULL )
   AND ( employee_id      = :emp_id OR :emp_id IS NULL )
   AND ( UPPER(last_name) = :name   OR :name   IS NULL )

The statement uses named bind variables for better readability. All possible filter expressions are statically coded in the statement. Whenever a filter isn’t needed, the respective search term is NULL so that the particular filter becomes ineffective.

It’s a perfectly reasonable SQL statement. The use of NULL is even in line with its definition according to the three valued logic of SQL. Regardless of that, it is still amongst the worst performance anti-patterns.

The databases can’t optimize for any particular filter because any of them could be canceled out. The execution plan needs to cover the worst case—that is that all filters are switched off:

----------------------------------------------------
| Id | Operation         | Name      | Rows | Cost |
----------------------------------------------------
|  0 | SELECT STATEMENT  |           |    2 |  478 |
|* 1 |  TABLE ACCESS FULL| EMPLOYEES |    2 |  478 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((:NAME   IS NULL OR UPPER("LAST_NAME")=:NAME) 
       AND (:EMP_ID IS NULL OR "EMPLOYEE_ID"=:EMP_ID) 
       AND (:SUB_ID IS NULL OR "SUBSIDIARY_ID"=:SUB_ID))

The database has to run a full table scan even if indexes exist on every individual column.

Remember that one use of bind parameters is to decrease parsing overhead—as explained in Section 3. The database will not re-evaluate the plan when the statement is executed with actual bind parameters. The prepared query plan has to cover all cases.

It’s not that the database is “stupid”. Many optimizers resolves the (tautological) expressions when inlined literally:

SELECT first_name, last_name, subsidiary_id, employee_id
  FROM employees
 WHERE( subsidiary_id    = NULL     OR NULL IS NULL )
   AND( employee_id      = NULL     OR NULL IS NULL )
   AND( UPPER(last_name) = 'WINAND' OR 'WINAND' IS NULL )
---------------------------------------------------------------
|Id | Operation                   | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT            |             |    1 |    2 |
| 1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |    1 |    2 |
|*2 |   INDEX RANGE SCAN          | EMP_UP_NAME |    1 |    1 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  2 - access(UPPER("LAST_NAME")='WINAND')

This is, however, no solution. It is just the proof the database can resolve these conditions.

Warning

Using literal values in SQL exposes your application to SQL injection and causes performance problems due to parsing overhead.

The solution is to build the SQL statement dynamically. Tell the database just what you want right now—nothing more.

SELECT first_name, last_name, subsidiary_id, employee_id
  FROM employees
 WHERE UPPER(last_name) = :name

The bind parameter prevents SQL injection and improves execution plan caching.

Tweet this tip

Tip

Use dynamic SQL if you need dynamic where clauses.

Still use bind parameters when generating dynamic SQL—otherwise, the “Dynamic SQL is Slow” myth comes true.

The problem described in this section is widespread. All databases that use a shared execution plan cache invented some features to cope with the problem. Often introducing new problems and bugs.

DB2

DB2 uses a shared execution plan cache and is fully exposed to the problem described in this section.

DB2 allows to specify the re-optimization approach using the REOPT hint. The default is NONE, which produces a generic execution plan and suffers from the problem described above. REOPT(ALWAYS) will tell the optimizer to always peek the actual bind variables to produce the best plan for each execution. That is effectively turning off execution plan caching for that statement.

The last option is REOPT(ONCE) which will peek the bind parameters for the first execution only. The problem with that approach is its nondeterministic behavior: whatever value was used in the first execution (e.g., since database startup) affects all executions. The execution plan can change every time the database is restarted or, more problematic, the cached plan expires. In other words; the execution plan can change at any time.

MySQL

MySQL doesn’t suffer from this particular problem at all, because it has no execution plan cache. A feature request from 2009 discusses the impact of execution plan caching. It seems that MySQL’s optimizer is simple enough so that execution plan caching doesn’t pay off.

Oracle

Oracle uses a shared execution plan cache (SQL area) and is fully exposed to the problem described in this section.

Oracle introduced the so-called bind peeking with release 9i. Bind peeking enables the optimizer to use the actual bind values of the first execution during parsing. The problem with that approach is its nondeterministic behavior: whatever value was used in the first execution (e.g., since database startup) affects all executions. The execution plan can change every time the database is restarted or, more problematic, the cached plan expires. In other words; the execution plan can change at any time.

Release 11g introduced adaptive cursor sharing to cope with the problem. This feature enables the database to have multiple execution plans for the same SQL statement. The “adaptive” approach is to run everything as usual, but to take note of the time each execution takes. In case one execution runs much slower than the others, the optimizer will create a tailor-made plan for the specific bind values. However, that tailor-made plan is created the next time the statement executes with the problematic bind values. That means that the first execution must run slow before the second execution can benefit.

PostgreSQL

PostgreSQL caches query plans only as long as the prepared statement is kept open. That means that the problem described in this section applies only if the same statement handle is re-used. Note that the default setting for JDBC’s PreparedStatement is not to re-use the query plan until its fifth execution. See also: Planning with Actual Bind Values.

SQL Server

SQL Server uses the so-called parameter sniffing at least since release 2000. Parameter sniffing enables the optimizer to use the actual bind values of the first execution during parsing. The problem with that approach is its nondeterministic behavior: whatever value was used in the first execution (e.g., since database startup) affects all executions. The execution plan can change every time the database is restarted or, more problematic, the cached plan expires and will be recompiled the next time. In other words; the execution plan can change at any time.

SQL Server 2005 added new query hits to gain more control over parameter sniffing and recompiling. The query hint RECOMPILE disables plan caching for a selected statement. OPTIMIZE FOR allows to specify actual parameter values that are used for optimization only. Finally, USE PLAN allows to specify which plan an execution should actually use.

With SQL Server 2008, it was realized and corrected that the OPTION(RECOMPILE) did not use all bind values. The new implementation was buggy, however, so that things become a little bit more complicated.

Erland Sommarskog collected all the information, for all versions of SQL Server.

Although the advanced techniques used by DB2, the Oracle database and SQL Server may improve the “smart logic” optimization to a certain extent, they actually aim to utilize column histograms.

Removing any unneeded parts from the SQL statement is still the most reliable method to get optimal performance.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql