The B-Tree: Makes the Index Fast


The index leaf nodes are stored in an arbitrary order—the position on the disk does not correspond to the logical position according to the index order. It is like a telephone directory with shuffled pages. If you search for “Smith” in but open it at “Robinson” in the first place, it is by no means granted that Smith comes farther back. Databases need a second structure to quickly find the entry among the shuffled pages: a balanced search tree—in short: B-Tree.

Figure 1.2. B-Tree Structure


Figure 1.2 shows an example index with 30 entries. The doubly linked list establishes the logical order between the leaf nodes. The root and branch nodes support quick searching among the leaf nodes.

The figure highlights a branch node and the leaf nodes it refers to. Each branch node entry corresponds to the biggest value in the respective leaf node. That is, 46 in the first leaf node so that the first branch node entry is also 46. The same is true for the other leaf nodes so that the branch node finally has the values 46, 53, 57 and 83. The other branch nodes are built in the same way.

Quick answers instead of long searches!
Instant Coaching is the online consulting service by the author of this page.

The next layer is built accordingly, but on top of the first branch node level. The procedure repeats until all keys fit into a single node, the root node. The structure is a balanced search tree because the tree depth is equal at every position; all leaf nodes have the same distance to the root node.

Note

B-Tree is for balanced tree—not binary tree.

Once created, the database maintains the index automatically. It applies every insert, delete and update to the index and keeps the tree in balance. Thus causing a maintenance overhead for write operations. Chapter 8 explains it in more detail.

Figure 1.3. B-Tree Traversal


Figure 1.3 shows an index fragment to illustrate a search for the key “57”. The tree traversal starts at the root node on the left hand side. Each entry is processed in ascending order until a value is bigger or equal (>=) to the search term (57). It is the entry 83 in the figure. The database follows the reference to the corresponding branch node and repeats the procedure until the tree traversal ends at a leaf node.

Important

The B-Tree helps to find a position in the leaf nodes.

The tree traversal is a very efficient operation—so efficient that I refer to it as the first power of indexing. It works almost instantly—even on a huge data set. That is first because of the tree balance, which allows accessing all elements with the same number of steps and second because of the logarithmic growth of the tree depth. That means that the tree depth grows very slowly compared to the number of leaf nodes. The sidebar “Logarithmic Scalability” describes this in more detail. Real world indexes with millions of records have a tree depth of four or five. A tree depth of six is hardly ever seen.

Logarithmic Scalability

In mathematics, the logarithm of a number to a given base is the power or exponent to which the base must be raised in order to produce the number [Wikipedia].

In a search tree, the base corresponds to the number of entries per branch node, and the exponent to the tree depth. The example index in Figure 1.2 holds up to four entries per node and has a tree depth of three. That means that the index can hold up to 64 (43) entries. If it grows by one level, it can already hold 256 entries (44). Each time a level is added, the maximum number of index entries quadruples. The logarithm reverses this function. The tree depth is therefore log4(Index Entries).

Tree DepthIndex Entries
364
4256
51.024
64.096
716.384
865.536
9262.144
101.048.576

The logarithmic growth enables the example index to search a million records with ten tree levels. But a real world index is even more efficient. The main factor that affects the tree depth, and therefore the lookup performance, is the number of entries in each tree node. The number of entries in each node corresponds to—mathematically speaking—the basis of the logarithm. The higher the basis, the shallower, the faster the traversal.

Databases expose this concept to a maximum extent and put as many entries as possible into each node—often hundreds. That means, every new index level supports hundred times more entries.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql