Index-Organized Tables and Clustered Indexes


The index-only scan executes an SQL statement using only the redundant data stored in the index. The original data in the heap table is not needed. If we take that concept to the next level and put all columns into the index, you may wonder why we need the heap table.

Some databases can indeed use an index as primary table store. The Oracle database calls this concept index-organized tables (IOT), other databases use the term clustered index. In this section, both terms are used to either put the emphasis on the table or the index characteristics as needed.

An index-organized table is thus a B-tree index without a heap table. This results in two benefits: (1) it saves the space for the heap structure; (2) every access on a clustered index is automatically an index-only scan. Both benefits sound promising but are hardly achievable in practice.

About our book “SQL Performance Explained”
Just the right amount of detail for the typical SQL Developer
Chandrasekar Ravoori on Amazon.co.uk (5 stars)

The drawbacks of an index-organized table become apparent when creating another index on the same table. Analogous to a regular index, a so-called secondary index refers to the original table data—which is stored in the clustered index. There, the data is not stored statically as in a heap table but can move at any time to maintain the index order. It is therefore not possible to store the physical location of the rows in the index-organized table in the secondary index. The database must use a logical key instead.

The following figures show an index lookup for finding all sales on May 23rd 2012. For comparison, we will first look at Figure 5.2 that shows the process when using a heap table. The execution involves two steps: (1) the INDEX RANGE SCAN; (2) the TABLE ACCESS BY INDEX ROWID.

Figure 5.2. Index-Based Access on a Heap Table


Although the table access might become a bottleneck, it is still limited to one read operation per row because the index has the ROWID as a direct pointer to the table row. The database can immediately load the row from the heap table because the index has its exact position. The picture changes, however, when using a secondary index on an index-organized table. A secondary index does not store a physical pointer (ROWID) but only the key values of the clustered index—the so-called clustering key. Often that is the primary key of the index-organized table.

Why Secondary Indexes have no ROWID

A direct pointer to the table row would be desirable for a secondary index as well. But that is only possible, if the table row stays at fixed storage positions. That is, unfortunately, not possible if the row is part of an index structure, which is kept in order. Keeping the index order needs to move rows occasionally. This is also true for operations that do not affect the row itself. An insert statement, for example, might split a leaf node to gain space for the new entry. That means that some entries are moved to a new data block at a different place.

A heap table, on the other hand, doesn’t keep the rows in any order. The database saves new entries wherever it finds enough space. Once written, data doesn’t move in heap tables.

Accessing a secondary index does not deliver a ROWID but a logical key for searching the clustered index. A single access, however, is not sufficient for searching clustered index—it requires a full tree traversal. That means that accessing a table via a secondary index searches two indexes: the secondary index once (INDEX RANGE SCAN), then the clustered index for each row found in the secondary index (INDEX UNIQUE SCAN).

Figure 5.3. Secondary Index on an IOT


Figure 5.3 makes it clear, that the B-tree of the clustered index stands between the secondary index and the table data.

Accessing an index-organized table via a secondary index is very inefficient, and it can be prevented in the same way one prevents a table access on a heap table: by using an index-only scan—in this case better described as “secondary-index-only scan”. The performance advantage of an index-only scan is even bigger because it not only prevents a single access but an entire INDEX UNIQUE SCAN.

Important

Accessing an index-organized table via a secondary index is very inefficient.

Using this example we can also see that databases exploit all the redundancies they have. Bear in mind that a secondary index stores the clustering key for each index entry. Consequently, we can query the clustering key from a secondary index without accessing the index-organized table:

SELECT sale_id
  FROM sales_iot
 WHERE sale_date = ?;
-------------------------------------------------
| Id | Operation        | Name           | Cost |
-------------------------------------------------
|  0 | SELECT STATEMENT |                |    4 |
|* 1 |  INDEX RANGE SCAN| SALES_IOT_DATE |    4 |
-------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SALE_DATE"=:DT)

The table SALES_IOT is an index-organized table that uses SALE_ID as clustering key. Although the index SALE_IOT_DATE is on the SALE_DATE column only, it still has a copy of the clustering key SALE_ID so it can satisfy the query using the secondary index only.

When selecting other columns, the database has to run an INDEX UNIQUE SCAN on the clustered index for each row:

SELECT eur_value
  FROM sales_iot
 WHERE sale_date = ?;
---------------------------------------------------
| Id  | Operation         | Name           | Cost |
---------------------------------------------------
|   0 | SELECT STATEMENT  |                |   13 |
|*  1 |  INDEX UNIQUE SCAN| SALES_IOT_PK   |   13 |
|*  2 |   INDEX RANGE SCAN| SALES_IOT_DATE |    4 |
---------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("SALE_DATE"=:DT)
   2 - access("SALE_DATE"=:DT)

Index-organized tables and clustered indexes are, after all, not as useful as it seems at first sight. Performance improvements on the clustered index are easily lost on when using a secondary index. The clustering key is usually longer than a ROWID so that the secondary indexes are larger than they would be on a heap table, often eliminating the savings from the omission of the heap table. The strength of index-organized tables and clustered indexes is mostly limited to tables that do not need a second index. Heap tables have the benefit of providing a stationary master copy that can be easily referenced.

Important

Tables with one index only are best implemented as clustered indexes or index-organized tables.

Tables with more indexes can often benefit from heap tables. You can still use index-only scans to avoid the table access. This gives you the select performance of a clustered index without slowing down other indexes.

Database support for index-organized tables and clustered index is very inconsistent. The following overview explains the most important specifics.

DB2

DB2 doesn’t have index-organized tables but uses the term “clustered index“ for a different feature. It uses a heap table, but tries to insert new rows in the same block as nearby rows in the index.

MySQL

The MyISAM engine only uses heap tables while the InnoDB engine always uses clustered indexes. That means you do not directly have a choice.

Oracle

The Oracle database uses heap tables by default. Index-organized tables can be created using the ORGANIZATION INDEX clause:

CREATE TABLE (
   id    NUMBER NOT NULL PRIMARY KEY,
   [...]
) ORGANIZATION INDEX;

The Oracle database always uses the primary key as the clustering key.

PostgreSQL

PostgreSQL only uses heap tables.

You can, however, use the CLUSTER clause to align the contents of the heap table with an index.

SQL Server

By default SQL Server uses clustered indexes (index-organized tables) using the primary key as clustering key. Nevertheless you can use arbitrary columns for the clustering key—even non-unique columns.

To create a heap table you must use the NONCLUSTERED clause in the primary key definition:

CREATE TABLE (
   id    NUMBER NOT NULL,
   [...]
   CONSTRAINT pk PRIMARY KEY NONCLUSTERED (id)
);

Dropping a clustered index transforms the table into a heap table.

SQL Server’s default behavior often causes performance problems when using secondary indexes.

If you like my way of explaining things, you’ll love my book.

About the Author

Photo of Markus Winand
Markus Winand tunes developers for high SQL performance. He also published the book SQL Performance Explained and offers in-house training as well as remote coaching at http://winand.at/

?Recent questions at
Ask.Use-The-Index-Luke.com

0
votes
1
answer
87
views

PostgreSQL Bitmap Heap Scan on index is very slow but Index Only Scan is fast

12 hours ago Markus Winand ♦♦ 881
index postgresql postgres sql
3
votes
2
answers
360
views

pagination with nulls

2 days ago Rocky 46
pagination
0
votes
2
answers
74
views