- August 12–22: Online Training (EU shift)
- Enrollment
- FOSDEM Impressions
- June 8–18: Online Training (US shift)
- Non-monetary micro sponsoring
- November 11-12 in Frankfurt am Main
- Online-Training in July and August
- Oracle + PostgreSQL
- Party time
- PostgreSQL Performance Event
- SQL Server Performance Kurs in Stuttgart
- SQL Server performance training in London
- Shipping Terms
- The two top performance problems caused by ORM tools
- Top Tweets January 2013
- Training Survey
- Use The Index, Luke
- Ask
- Consulting
2011-07-16Smart Logic
One of the key features of SQL databases is their support for ad-hoc queries: new queries can be executed at any time. This is only possible because the query optimizer (query planner) works at runtime; it analyzes each statement when received and generates a reasonable execution plan immediately. The overhead introduced by runtime optimization can be minimized with bind parameters.
The gist of that recap is that databases are optimized for dynamic SQL—so use it if you need it.
London, 10/11 June 2013: 2 days SQL performance training with Markus Winand. Register now.
Nevertheless there is a widely used practice that avoids dynamic SQL in favor of static SQL—often because of the “dynamic SQL is slow” myth. This practice does more harm than good if the database uses a shared execution plan cache like DB2, the Oracle database, or SQL Server.
For the sake of demonstration, imagine an application that queries the EMPLOYEES table. The application allows searching for subsidiary id, employee id and last name (case-insensitive) in any combination. It is still possible to write a single query that covers all cases by using “smart” logic.
SELECT first_name, last_name, subsidiary_id, employee_id FROM employees WHERE ( subsidiary_id = :sub_id OR :sub_id IS NULL ) AND ( employee_id = :emp_id OR :emp_id IS NULL ) AND ( UPPER(last_name) = :name OR :name IS NULL )
The query uses named bind variables for better readability. All possible filter expressions are statically coded in the statement. Whenever a filter isn’t needed, you just use NULL instead of a search term: it disables the condition via the OR logic.
It is a perfectly reasonable SQL statement. The use of NULL is even in line with its definition according to the three-valued logic of SQL. Nevertheless it is one of the worst performance anti-patterns of all.
The database cannot optimize the execution plan for a particular filter because any of them could be canceled out at runtime. The database needs to prepare for the worst case—if all filters are disabled:
----------------------------------------------------
| Id | Operation | Name | Rows | Cost |
----------------------------------------------------
| 0 | SELECT STATEMENT | | 2 | 478 |
|* 1 | TABLE ACCESS FULL| EMPLOYEES | 2 | 478 |
----------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter((:NAME IS NULL OR UPPER("LAST_NAME")=:NAME)
AND (:EMP_ID IS NULL OR "EMPLOYEE_ID"=:EMP_ID)
AND (:SUB_ID IS NULL OR "SUBSIDIARY_ID"=:SUB_ID))
As a consequence, the database uses a full table scan even if there is an index for each column.
It is not that the database cannot resolve the “smart” logic. It creates the generic execution plan due to the use of bind parameters so it can be cached and re-used with other values later on. If we do not use bind parameters but write the actual values in the SQL statement, the optimizer selects the proper index for the active filter:
SELECT first_name, last_name, subsidiary_id, employee_id
FROM employees
WHERE( subsidiary_id = NULL OR NULL IS NULL )
AND( employee_id = NULL OR NULL IS NULL )
AND( UPPER(last_name) = 'WINAND' OR 'WINAND' IS NULL )
---------------------------------------------------------------
|Id | Operation | Name | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 2 |
| 1 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 2 |
|*2 | INDEX RANGE SCAN | EMP_UP_NAME | 1 | 1 |
---------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access(UPPER("LAST_NAME")='WINAND')
This, however, is no solution. It just proves that the database can resolve these conditions.
Warning
Using literal values makes your application vulnerable to SQL injection attacks and can cause performance problems due to increased optimization overhead.
The obvious solution for dynamic queries is dynamic SQL. According to the KISS principle, just tell the database what you need right now—and nothing else.
SELECT first_name, last_name, subsidiary_id, employee_id FROM employees WHERE UPPER(last_name) = :name
Note that the query uses a bind parameter.
Tweet this tip
Tip
Use dynamic SQL if you need dynamic where clauses.
Still use bind parameters when generating dynamic SQL—otherwise the “dynamic SQL is slow” myth comes true.
The problem described in this section is widespread. All databases that use a shared execution plan cache have a feature to cope with it—often introducing new problems and bugs.
- DB2
DB2 uses a shared execution plan cache and is fully exposed to the problem described in this section.
DB2 allows to specify the re-optimization approach using the
REOPThint. The default isNONE, which produces a generic execution plan and suffers from the problem described above.REOPT(ALWAYS)will tell the optimizer to always peek the actual bind variables to produce the best plan for each execution. That is effectively turning off execution plan caching for that statement.The last option is
REOPT(ONCE)which will peek the bind parameters for the first execution only. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.- MySQL
MySQL does not suffer from this particular problem because it has no execution plan cache at all . A feature request from 2009 discusses the impact of execution plan caching. It seems that MySQL’s optimizer is simple enough so that execution plan caching does not pay off.
- Oracle
The Oracle database uses a shared execution plan cache (“SQL area”) and is fully exposed to the problem described in this section.
Oracle introduced the so-called bind peeking with release 9i. Bind peeking enables the optimizer to use the actual bind values of the first execution when preparing an execution plan. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.
Release 11g introduced adaptive cursor sharing to further improve the situation. This feature makes it possible for the database to cache multiple execution plans for the same SQL statement. The “adaptive” approach is to run everything as usual and log the execution times. If one execution runs much slower than the others, the optimizer takes note of the bind values used for this execution. The next time the statement is executed using these values the optimizer creates a customized execution plan using these bind values. That means that one execution must run slow before the second execution can benefit from a customized execution plan.
- PostgreSQL
The PostgreSQL query plan cache works for open statements only—that is as long as you keep the
PreparedStatementopen. The above described problem occurs only when re-using a statement handle. Note that PostgresSQL’s JDBC driver enables the cache after the fifth execution only. See also: Planning with Actual Bind Values.- SQL Server
SQL Server uses so-called parameter sniffing. Parameter sniffing enables the optimizer to use the actual bind values of the first execution during parsing. The problem with this approach is its nondeterministic behavior: the values from the first execution affect all executions. The execution plan can change whenever the database is restarted or, less predictably, the cached plan expires and the optimizer recreates it using different values the next time the statement is executed.
SQL Server 2005 added new query hits to gain more control over parameter sniffing and recompiling. The query hint
RECOMPILEbypasses the plan cache for a selected statement.OPTIMIZE FORallows the specification of actual parameter values that are used for optimization only. Finally, you can provide an entire execution plan with theUSE PLANhint.The original implementation of the
OPTION(RECOMPILE)hint had a bug so it did not consider all bind variables. The new implementation introduced with SQL Server 2008 had another bug, making the situation very confusing. Erland Sommarskog has collected the all relevant information covering all SQL Server releases.
Although heuristic methods can improve the “smart logic” problem to a certain extent, they were actually built to deal with the problems of bind parameter in connection with column histograms and LIKE expressions.
The most reliable method for arriving at the best execution plan is to avoid unnecessary filters in the SQL statement.
Stay connected:
RSS Feed
Like on Facebook
Follow me on Twitter
Share at Google+
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook