Planning for Re-Use
There was a discussion about bind parameters and execution plan caching in a recent reddit thread about my "Avoid Smart Logic for Conditional
WHERE clauses" page. My article mentions that bind variables make the actual values "invisible" for the optimizer resulting in a generic plan that can be cached and re-used later on. Jeff Davis emphasized that using bind parameters and the caching of execution plans are two distinct concepts and provided a PostgreSQL/Ruby example for that.
It took a week, but I can finally present my findings on that topic. The original article is already updated, but I think it is worth presenting it in a different from as well—like a FAQ:
- What is an execution plan cache?
An execution plan cache reduces load by caching prepared execution plans. If the same statement is executed later on, the plan can be re-used without the optimization overhead.
- Which databases use an execution plan cache?
DB2, Oracle and SQL Server use shared execution plan caches. That means an execution plan used in one session can be re-used by another sessions.
PostgreSQL caches query plans only as long as the prepared statement is open. The query plan is disposed when the prepared statement is closed.
MySQL has no execution plan cache (don’t confuse with Query [Result] Cache).
- What has that to do with bind parameters?
The shared execution plan caches of DB2, Oracle and SQL Server use a hash value of the literal SQL string as key to the cache. Cached plans are not found if the SQL contains literal values that vary with each execution.
Place holders (bind parameters) unify the statement so that the SQL string is identical when executed with different values—thus, increasing the cache-hit rate.
- Literal values in SQL kind of disable execution plan re-use?
In principle yes. However, there are so many applications that don’t use bind variables that database vendors built a workaround: they can turn literal values to placeholder expressions.
That means that there is a thin layer in the database that looks at every SQL string, checks for literals and transforms them to bind parameters before checking the execution plan cache. These feature is called
PARAMETERIZATIONin SQL Server. It defaults to simple—only the "simple" statements are automatically transformed to use bind parameters. The corresponding Oracle feature is
CURSOR_SHARING. It defaults to
EXACTwhich means that no magic is applied to the SQL strings.
- What’s the downside of execution plan re-use?
Execution plan caches use the SQL string as key. Same SQL string, same execution plan. That means that the optimizer must plan for re-use. It cannot assume anything about the bind values used for later executions. The smart logic approach makes that quite obvious. The following cannot optimized "away" when planning for re-use:
(last_name = :name OR :name IS NULL)
Besides that, the optimizer doesn’t consider column histograms when planning for re-use.
- What are the vendors doing about it?
The hint based approaches will essentially disable execution plan re-use for the respective statement. Oracle’s adaptive cursor sharing supports multiple execution plans per SQL string and learns from slow executions.
- Can we have both: plan re-use and well optimized queries?
Sure. Disable parametrization magic, use bind parameters and write dynamic SQL for conditional filters.
Explicitly using bind parameters solves three problems at once: (1) SQL injection, (2) execution plan cache-hit rate and (3) having manual control to utilize histograms—the exception for bind parameters. Dynamic SQL will automatically lead to well optimized execution plans for conditional filters.
- But concatenating SQL fragments is ugly!?!
Yes, it is—especially when using bind parameters. It is often not possible to provide bind values before the SQL string was passed to the database abstraction layer. That means that all the conditions need to be repeated.
Many ORM tools have a query builder that simplifies dynamic SQL a little. Some examples are available in the appendix.
Should we Build SQL Strings Dynamically in 2011, as Jeff Davis asked?
There is, in theory, no benefit from dynamic SQL if the optimizer doesn’t plan for re-use. But it won’t hurt either—except if code gets ugly. However, the problem is that there is no standard way to put an optimizer into a "disposal mode" that doesn’t plan for re-use. PostgreSQL has a proprietary API solution, SQL Server has a hint that was buggy even in 2008 R2. DB2 has a different hint seems to work since years. Oracle doesn’t give control over "execution plan recycling" (workaround), MySQL never plans for re-use.
However, all of that is irrelevant if you use dynamic SQL for conditional filters. That just works. Always.