About Optimizer Hints
Quite often I’m asked what I think about query hints. The answer is more lengthy and probably also more two-fold than most people expect it to be. However, to answer this question once and forever, I though I should write it down.
The most important fact about query hints is that not all query hints are born equally. I distinguish two major types:
- Restricting Hints
Most query hints are restricting hints: they limit the optimizers’ freedom to choose an execution plan. “Hint” is an incredibly bad name for these things as they force the optimizer to do what it has been told—probably the reason MySQL uses the FORCE keyword for those.
I do not like restricting hints, yet I use them sometimes to test different execution plans. It usually goes like this: when I believe a different execution plan could (should?) give better performance, I just hint it to see if it really gives better performance. Quite often it becomes slower and sometimes I even realize that the execution plan I though of does not work at all—at least not with the database I’m working at that moment.
Typical examples for restricting query hints are hints that force the database to use or not use a particular index (e.g., INDEX and NO_INDEX in the Oracle database, USE INDEX and IGNORE INDEX in MySQL, or INDEX, FORCESEEK and the like in SQL Server).
So, what’s wrong with them? Well, the two main problems are that they (1) restrict the optimizer and that they (2) often need volatile object names as parameters (e.g., index names). Example: if you use a hint to use index
ABCfor a query, the hint becomes ineffective when somebody changes the name of the index to
ABCD. Further, if you restrict the optimizer you can no longer expect it to adjust the execution plan if you add another index that servers the query better. Of course there are ways around these problems. The Oracle database, for example, offers "index description" hints to avoid both issues: instead of specifying the index name, it accepts a description of the ideal index (column list) and it selects the index that matches this definition best.
Nevertheless, I strongly recommend against using restricting query hints in production. Instead you should find out why the optimizer does “the wrong thing” and fix the root cause. Restricting hints fix the symptom, not the cause. That being said, I know that there is sometimes no other reasonable choice.
- Supporting Hints
The second major type of query hints are supporting hints: they support the optimizer by providing information it doesn’t have otherwise. Supporting hints are rare—I’m only aware of a few good examples and the most useful one has already become obsolete: it’s FAST number_rows (SQL Server) and FIRST_ROWS(n) (Oracle). They tell the optimizer that the application plans to fetch only that many rows of the result. Consequently, the optimizer can prefer using indexes and nested loop joins that would be inefficient when fetching the full result (see Chapter 7, Partial Results for more details). Although being kind-of obsolete, I’m still using these hints as the defining example for supporting hints because they provide information the optimizer cannot have otherwise. This particular example is important enough that it was worth defining new keywords in the ISO SQL:2008: FETCH FIRST ... ROWS ONLY and OFFSET. That’s why this hint is a very good, yet obsolete example for supporting query hints.
Another example for supporting hints is the (undocumented)
CARDINALITYhint of the Oracle database. It basically overwrites the row count estimate of sub-queries. This hint was often used if the combined selectivity of two predicates was way off the product of the selectivity of each individual predicate (see Combined Selectivity Example). But this hint is also outdated since Oracle database 11g introduced extended statistics to cope with issues like that. SQL Server’s filtered statistics serve the same purpose. If your database cannot reflect data correlation in it’s statistics, you’ll need to fall back to restricting hints.
The Oracle hint
OPT_ESTIMATEis somehow the successor of the
CARDINALITYhint for cases when the estimations are still off. Pythian wrote a nice article about
It think supporting hints are not that bad: they are just a way to cope with known limitations of the optimizer. That’s probably why they tend to become obsolete when the optimizers evolve.
And Then There Was PostgreSQL
You might have noticed that I did not mention PostgreSQL. It’s probably because PostgreSQL doesn’t have query hints although it has (which are actually session parameters). Confused? No problem, there is a short Wiki for that.
However, to see some discussion about introducing a similar hint as
CARDINALITY described above or implementing "cross column statistics" read the first few messages in this thread from February 2011 (after the first page, the discussion moves to another direction). And the result? PostgreSQL still doesn’t have a good way to cope with the original problem of column correlation.