Tuning SQL Join Performance


An SQL query walks into a bar and sees two tables.
He walks up to them and asks ’Can I join you?’

— Unknown

The SQL join operation transforms normalized data from multiple tables into a denormalized form that suits a specific processing purpose. Joining is particularly sensitive to disk seek latencies because it is often accessing scattered data. Although proper indexing is the obvious solution to reduce the retrieval time, the indexing strategy depends on the join algorithm. Most SQL databases have three join algorithms to choose from, each having different performance characteristics and requiring different indexes.

Faster, easier and more memorable than reading.

There is, however, one thing that is common to all join algorithms: they process only two tables at a time. Joining more tables requires multiple steps: first building an intermediate result set by joining two tables, then joining the result with the next table. And so forth.

Avoiding result set materialization with pipelining

Although intermediate result sets explains the algorithm very well, it does not mean that the database has to materialize it. Materializing the intermediate result set means to complete the first join, collecting the intermediate result in memory—or even disk if it is too large for memory. Instead, databases use pipelining to reduce memory usage. That means that each row from the intermediate result set is directly pipelined to the next join operation—avoiding the need to store the intermediate result set.

You might wonder if the join order makes a difference? Well, it doesn’t—not semantically. The final result is always the same, regardless of the join order. But the performance is vastly depending on the join order. The optimizer will therefore check all possible join order permutations and take the one with the lowest cost value. That means that optimizing itself—preparing the statement—becomes a bottleneck for complex statements. The more tables to join, the more execution plan variants to check—mathematically speaking: n! (Factorial). But that is only a problem if the application doesn’t use bind parameters.

Important

The more complex the statement gets, the more important bind parameters become.

Not using bind parameters is like recompiling a program every time.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
277
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql
1
vote
1
answer
210
views

Should 'id' (the primary key) be included in an index

Jan 03 at 15:24 Jan 26
index include
0
votes
1
answer
227
views

Best index for a multiple join-tables and filter

Jan 03 at 14:31 Markus Winand ♦♦ 216
index join where