Clustering Data: The Second Power of Indexing


The term “cluster” is used in various fields. A star cluster, for example, is a group of stars. A computer cluster, on the other hand, is a group of computers that work closely together—either to solve a complex problem (high-performance computing cluster) or to increase availability (failover cluster). Generally speaking, clusters are related things that appear together.

But there is one more type of cluster that applies to computing—one that is often misunderstood: data clusters. Clustering data means to store consecutively accessed data closely together, so that accessing them needs less IO operations. Data clusters are a very important database tuning concept. Computer clusters, on the other hand, are also very common in database context—thus, making the term “cluster” very unclear. The sentence “Let’s use a cluster to improve database performance” is just one example that might refer to a computer cluster, but could also mean a data cluster. In this chapter, clusters are generally referring to data clusters.

Coaching by the Author
Faster, easier and more memorable than reading.

The simplest data cluster in SQL databases is the row. Databases store all columns of a row in the same database block, if possible. Exceptions occur if a row doesn’t fit into a single block, especially if LOB types are involved.

Column Stores

Column oriented databases, or column-stores, organize tables in a column-wise fashion. It is advantageous if accessing many rows, but only a fraction of table columns—a pattern that is very common in data warehousing (OLAP).

Indexes allow to cluster data. The concept was already introduced in Chapter 1, “Anatomy of an Index”. The index leaf nodes hold the indexed column in an ordered fashion: similar values are stored next to each other. That means, that indexes build clusters of rows with similar values. This ability—to cluster data—is so important that I refer to it as the second power of indexing.

Note

The B-Tree traversal is the first power of indexing.

Clustering is the second power of indexing.

The following sections explain how to use clustering to improve query performance.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql