Smart Logic


One of the key features of SQL databases is their support for ad-hoc queries: new queries can be executed at any time. This is only possible because the query optimizer (query planner) works at runtime; it analyzes each statement when received and generates a reasonable execution plan immediately. The overhead introduced by runtime optimization can be minimized with bind parameters.

The gist of that recap is that databases are optimized for dynamic SQL—so use it if you need it.

About our book “SQL Performance Explained”
This book is definitively worth having in the company library.
” — Joe Celko

Nevertheless there is a widely used practice that avoids dynamic SQL in favor of static SQL—often because of the “dynamic SQL is slow” myth. This practice does more harm than good if the database uses a shared execution plan cache like DB2, the Oracle database, or SQL Server.

For the sake of demonstration, imagine an application that queries the EMPLOYEES table. The application allows searching for subsidiary id, employee id and last name (case-insensitive) in any combination. It is still possible to write a single query that covers all cases by using “smart” logic.

SELECT first_name, last_name, subsidiary_id, employee_id
  FROM employees
 WHERE ( subsidiary_id    = :sub_id OR :sub_id IS NULL )
   AND ( employee_id      = :emp_id OR :emp_id IS NULL )
   AND ( UPPER(last_name) = :name   OR :name   IS NULL )

The query uses named bind variables for better readability. All possible filter expressions are statically coded in the statement. Whenever a filter isn’t needed, you just use NULL instead of a search term: it disables the condition via the OR logic.

It is a perfectly reasonable SQL statement. The use of NULL is even in line with its definition according to the three-valued logic of SQL. Nevertheless it is one of the worst performance anti-patterns of all.

The database cannot optimize the execution plan for a particular filter because any of them could be canceled out at runtime. The database needs to prepare for the worst case—if all filters are disabled:

----------------------------------------------------
| Id | Operation         | Name      | Rows | Cost |
----------------------------------------------------
|  0 | SELECT STATEMENT  |           |    2 |  478 |
|* 1 |  TABLE ACCESS FULL| EMPLOYEES |    2 |  478 |
----------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((:NAME   IS NULL OR UPPER("LAST_NAME")=:NAME) 
       AND (:EMP_ID IS NULL OR "EMPLOYEE_ID"=:EMP_ID) 
       AND (:SUB_ID IS NULL OR "SUBSIDIARY_ID"=:SUB_ID))

As a consequence, the database uses a full table scan even if there is an index for each column.

It is not that the database cannot resolve the “smart” logic. It creates the generic execution plan due to the use of bind parameters so it can be cached and re-used with other values later on. If we do not use bind parameters but write the actual values in the SQL statement, the optimizer selects the proper index for the active filter:

SELECT first_name, last_name, subsidiary_id, employee_id
  FROM employees
 WHERE( subsidiary_id    = NULL     OR NULL IS NULL )
   AND( employee_id      = NULL     OR NULL IS NULL )
   AND( UPPER(last_name) = 'WINAND' OR 'WINAND' IS NULL )
---------------------------------------------------------------
|Id | Operation                   | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT            |             |    1 |    2 |
| 1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |    1 |    2 |
|*2 |   INDEX RANGE SCAN          | EMP_UP_NAME |    1 |    1 |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  2 - access(UPPER("LAST_NAME")='WINAND')

This, however, is no solution. It just proves that the database can resolve these conditions.

Warning

Using literal values makes your application vulnerable to SQL injection attacks and can cause performance problems due to increased optimization overhead.

The obvious solution for dynamic queries is dynamic SQL. According to the KISS principle, just tell the database what you need right now—and nothing else.

SELECT first_name, last_name, subsidiary_id, employee_id
  FROM employees
 WHERE UPPER(last_name) = :name

Note that the query uses a bind parameter.

Tweet this tip

Tip

Use dynamic SQL if you need dynamic where clauses.

Still use bind parameters when generating dynamic SQL—otherwise the “dynamic SQL is slow” myth comes true.

The problem described in this section is widespread. All databases that use a shared execution plan cache have a feature to cope with it—often introducing new problems and bugs.

DB2

DB2 uses a shared execution plan cache and is fully exposed to the problem described in this section.

DB2 allows to specify the re-optimization approach using the REOPT hint. The default is NONE, which produces a generic execution plan and suffers from the problem described above. REOPT(ALWAYS) will tell the optimizer to always peek the actual bind variables to produce the best plan for each execution. That is effectively turning off execution plan caching for that statement.

The last option is REOPT(ONCE) which will peek the bind parameters for the first execution only. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.

MySQL

MySQL does not suffer from this particular problem because it has no execution plan cache at all . A feature request from 2009 discusses the impact of execution plan caching. It seems that MySQL’s optimizer is simple enough so that execution plan caching does not pay off.

Oracle

The Oracle database uses a shared execution plan cache (“SQL area”) and is fully exposed to the problem described in this section.

Oracle introduced the so-called bind peeking with release 9i. Bind peeking enables the optimizer to use the actual bind values of the first execution when preparing an execution plan. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.

Release 11g introduced adaptive cursor sharing to further improve the situation. This feature allows the database to cache multiple execution plans for the same SQL statement. Further, the optimizer peeks the bind parameters and stores their estimated selectivity along with the execution plan. When the cache is subsequently accessed, the selectivity of the current bind values must fall within the selectivity ranges of a cached execution plan to be reused. Otherwise the optimizer creates a new execution plan and compares it against the already cached execution plans for this query. If there is already such an execution plan, the database replaces it with a new execution plan that also covers the selectivity estimates of the current bind values. If not, it caches a new execution plan variant for this query — along with the selectivity estimates, of course.

PostgreSQL

The PostgreSQL query plan cache works for open statements only—that is as long as you keep the PreparedStatement open. The above described problem occurs only when re-using a statement handle. Note that PostgresSQL’s JDBC driver enables the cache after the fifth execution only. See also: Planning with Actual Bind Values.

SQL Server

SQL Server uses so-called parameter sniffing. Parameter sniffing enables the optimizer to use the actual bind values of the first execution during parsing. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.

SQL Server 2005 added new query hints to gain more control over parameter sniffing and recompiling. The query hint RECOMPILE bypasses the plan cache for a selected statement. OPTIMIZE FOR allows the specification of actual parameter values that are used for optimization only. Finally, you can provide an entire execution plan with the USE PLAN hint.

The original implementation of the OPTION(RECOMPILE) hint had a bug so it did not consider all bind variables. The new implementation introduced with SQL Server 2008 had another bug, making the situation very confusing. Erland Sommarskog has collected the all relevant information covering all SQL Server releases.

Although heuristic methods can improve the “smart logic” problem to a certain extent, they were actually built to deal with the problems of bind parameter in connection with column histograms and LIKE expressions.

The most reliable method for arriving at the best execution plan is to avoid unnecessary filters in the SQL statement.

If you like my way of explaining things, you’ll love my book.

About the Author

Photo of Markus Winand
Markus Winand tunes developers for high SQL performance. He also published the book SQL Performance Explained and offers in-house training as well as remote coaching at http://winand.at/

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
0
answers
25
views

spam removed

19 hours ago Markus Winand ♦♦ 656
spam
0
votes
1
answer
38
views

Database design suggestions for a data scraping/warehouse application?

Aug 27 at 09:29 Markus Winand ♦♦ 656
mysql optimization database
1
vote
1
answer
195
views

How to query for "previous page" with keyset pagination?

Aug 22 at 04:21 alextsg 16
pagination postgresql