2011-10-23The B-Tree: Makes the Index Fast
The index leaf nodes are stored in an arbitrary order—the position on the disk does not correspond to the logical position according to the index order. It is like a telephone directory with shuffled pages. If you search for “Smith” in but open it at “Robinson” in the first place, it is by no means granted that Smith comes farther back. Databases need a second structure to quickly find the entry among the shuffled pages: a balanced search tree—in short: B-Tree.
Figure 1.2. B-Tree Structure
Figure 1.2 shows an example index with 30 entries. The doubly linked list establishes the logical order between the leaf nodes. The root and branch nodes support quick searching among the leaf nodes.
The figure highlights a branch node and the leaf nodes it refers to. Each branch node entry corresponds to the biggest value in the respective leaf node. That is, 46 in the first leaf node so that the first branch node entry is also 46. The same is true for the other leaf nodes so that the branch node finally has the values 46, 53, 57 and 83. The other branch nodes are built in the same way.
The next layer is built accordingly, but on top of the first branch node level. The procedure repeats until all keys fit into a single node, the root node. The structure is a balanced search tree because the tree depth is equal at every position; all leaf nodes have the same distance to the root node.
Note
B-Tree is for balanced tree—not binary tree.
Once created, the database maintains the index automatically. It applies every insert, delete and update to the index and keeps the tree in balance. Thus causing a maintenance overhead for write operations. Chapter 9 explains it in more detail.
Figure 1.3. B-Tree Traversal
Figure 1.3 shows an index fragment to illustrate a search for the key “57”. The tree traversal starts at the root node on the left hand side. Each entry is processed in ascending order until a value is bigger or equal (>=) to the search term (57). It is the entry 83 in the figure. The database follows the reference to the corresponding branch node and repeats the procedure until the tree traversal ends at a leaf node.
Important
The B-Tree helps to find a position in the leaf nodes.
The tree traversal is a very efficient operation—so efficient that I refer to it as the first power of indexing. It works almost instantly—even on a huge data set. That is first because of the tree balance, which allows accessing all elements with the same number of steps and second because of the logarithmic growth of the tree depth. That means that the tree depth grows very slowly compared to the number of leaf nodes. The sidebar “Logarithmic Scalability” describes this in more detail. Real world indexes with millions of records have a tree depth of four or five. A tree depth of six is hardly ever seen.
share and subscribe
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook