Although the table access might become a bottleneck, it is still limited to one read operation per row because the index has the ROWID
as a direct pointer to the table row. The database can immediately load the row from the heap table because the index has its exact position. The picture changes, however, when using a secondary index on an index-organized table. A secondary index does not store a physical pointer (ROWID
) but only the key values of the clustered index—the so-called clustering key. Often that is the primary key of the index-organized table.
Accessing a secondary index does not deliver a ROWID
but a logical key for searching the clustered index. A single access, however, is not sufficient for searching clustered index—it requires a full tree traversal. That means that accessing a table via a secondary index searches two indexes: the secondary index once (INDEX RANGE SCAN
), then the clustered index for each row found in the secondary index (INDEX UNIQUE SCAN
).
Figure 5.3 makes it clear, that the B-tree of the clustered index stands between the secondary index and the table data.
Accessing an index-organized table via a secondary index is very inefficient, and it can be prevented in the same way one prevents a table access on a heap table: by using an index-only scan—in this case better described as “secondary-index-only scan”. The performance advantage of an index-only scan is even bigger because it not only prevents a single access but an entire INDEX UNIQUE SCAN
.
Accessing an index-organized table via a secondary index is very inefficient.
Using this example we can also see that databases exploit all the redundancies they have. Bear in mind that a secondary index stores the clustering key for each index entry. Consequently, we can query the clustering key from a secondary index without accessing the index-organized table:
SELECT sale_id
FROM sales_iot
WHERE sale_date = ?
-------------------------------------------------
| Id | Operation | Name | Cost |
-------------------------------------------------
| 0 | SELECT STATEMENT | | 4 |
|* 1 | INDEX RANGE SCAN| SALES_IOT_DATE | 4 |
-------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SALE_DATE"=:DT)
The table SALES_IOT
is an index-organized table that uses SALE_ID
as clustering key. Although the index SALE_IOT_DATE
is on the SALE_DATE
column only, it still has a copy of the clustering key SALE_ID
so it can satisfy the query using the secondary index only.
When selecting other columns, the database has to run an INDEX UNIQUE SCAN
on the clustered index for each row:
SELECT eur_value
FROM sales_iot
WHERE sale_date = ?
---------------------------------------------------
| Id | Operation | Name | Cost |
---------------------------------------------------
| 0 | SELECT STATEMENT | | 13 |
|* 1 | INDEX UNIQUE SCAN| SALES_IOT_PK | 13 |
|* 2 | INDEX RANGE SCAN| SALES_IOT_DATE | 4 |
---------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("SALE_DATE"=:DT)
2 - access("SALE_DATE"=:DT)
Index-organized tables and clustered indexes are, after all, not as useful as it seems at first sight. Performance improvements on the clustered index are easily lost on when using a secondary index. The clustering key is usually longer than a ROWID
so that the secondary indexes are larger than they would be on a heap table, often eliminating the savings from the omission of the heap table. The strength of index-organized tables and clustered indexes is mostly limited to tables that do not need a second index. Heap tables have the benefit of providing a stationary master copy that can be easily referenced.
Tables with one index only are best implemented as clustered indexes or index-organized tables.
Tables with more indexes can often benefit from heap tables. You can still use index-only scans to avoid the table access. This gives you the select
performance of a clustered index without slowing down other indexes.
Database support for index-organized tables and clustered index is very inconsistent. The following overview explains the most important specifics.
Db2 doesn’t have index-organized tables but uses the term “clustered index“ for a different feature. It uses a heap table, but tries to insert
new rows in the same block as nearby rows in the index.
The MyISAM engine only uses heap tables while the InnoDB engine always uses clustered indexes. That means you do not directly have a choice.
The Oracle database uses heap tables by default. Index-organized tables can be created using the ORGANIZATION INDEX
clause:
CREATE TABLE (
id NUMBER NOT NULL PRIMARY KEY,
[...]
) ORGANIZATION INDEX
The Oracle database always uses the primary key as the clustering key.
PostgreSQL only uses heap tables.
You can, however, use the CLUSTER
clause to align the contents of the heap table with an index.
By default SQL Server uses clustered indexes (index-organized tables) using the primary key as clustering key. Nevertheless you can use arbitrary columns for the clustering key—even non-unique columns.
To create a heap table you must use the NONCLUSTERED
clause in the primary key definition:
CREATE TABLE (
id NUMBER NOT NULL,
[...]
CONSTRAINT pk PRIMARY KEY NONCLUSTERED (id)
)
Dropping a clustered index transforms the table into a heap table.
SQL Server’s default behavior often causes performance problems when using secondary indexes.
You can’t learn everything in one day. Subscribe the newsletter via E-Mail, Bluesky or RSS to gradually catch up. Have a look at modern-sql.com as well.
The essence of SQL tuning in 200 pages
Buy now!
(paperback and/or PDF)
Paperback also available at Amazon.com.
Markus offers SQL training and consulting for developers working at companies of all sizes.
Learn more »