The Join Operation


An SQL query walks into a bar and sees two tables.
He walks up to them and asks ’Can I join you?’

— Source: Unknown

The SQL join operation transforms normalized data from multiple tables into a denormalized form that suits a specific processing purpose. Joining is particularly sensitive to disk seek latencies because it is often accessing scattered data. Although proper indexing is the obvious solution to reduce the retrieval time, the indexing strategy depends on the join algorithm. Most SQL databases have three join algorithms to choose from, each having different performance characteristics and requiring different indexes.

Coaching by the Author
Faster, easier and more memorable than reading.

There is, however, one thing that is common to all join algorithms: they process only two tables at a time. Joining more tables requires multiple steps: first building an intermediate result set by joining two tables, then joining the result with the next table. And so forth.

Avoiding result set materialization with pipelining

Although intermediate result sets explains the algorithm very well, it does not mean that the database has to materialize it. Materializing the intermediate result set means to complete the first join, collecting the intermediate result in memory—or even disk if it is too large for memory. Instead, databases use pipelining to reduce memory usage. That means that each row from the intermediate result set is directly pipelined to the next join operation—avoiding the need to store the intermediate result set.

You might wonder if the join order makes a difference? Well, it doesn’t—not semantically. The final result is always the same, regardless of the join order. But the performance is vastly depending on the join order. The optimizer will therefore check all possible join order permutations and take the one with the lowest cost value. That means that optimizing itself—preparing the statement—becomes a bottleneck for complex statements. The more tables to join, the more execution plan variants to check—mathematically speaking: n! (Factorial). But that is only a problem if the application doesn’t use bind parameters.

Important

The more complex the statement gets, the more important bind parameters become.

Not using bind parameters is like recompiling a program every time.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql