2011-05-26The Join Operation
An SQL query walks into a bar and sees two tables.
He walks up to them and asks ’Can I join you?’— Source: Unknown
The SQL join operation transforms normalized data from multiple tables into a denormalized form that suits a specific processing purpose. Joining is particularly sensitive to disk seek latencies because it is often accessing scattered data. Although proper indexing is the obvious solution to reduce the retrieval time, the indexing strategy depends on the join algorithm. Most SQL databases have three join algorithms to choose from, each having different performance characteristics and requiring different indexes.
There is, however, one thing that is common to all join algorithms: they process only two tables at a time. Joining more tables requires multiple steps: first building an intermediate result set by joining two tables, then joining the result with the next table. And so forth.
You might wonder if the join order makes a difference? Well, it doesn’t—not semantically. The final result is always the same, regardless of the join order. But the performance is vastly depending on the join order. The optimizer will therefore check all possible join order permutations and take the one with the lowest cost value. That means that optimizing itself—preparing the statement—becomes a bottleneck for complex statements. The more tables to join, the more execution plan variants to check—mathematically speaking: n! (Factorial). But that is only a problem if the application doesn’t use bind parameters.
Important
The more complex the statement gets, the more important bind parameters become.
Not using bind parameters is like recompiling a program every time.
share and subscribe
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook