Indexing Order By


Applies to
MySQLYes
OracleYes
PostgreSQLYes
SQL ServerYes

SQL queries with an order by clause do not need to sort the result explicitly if the relevant index already delivers the rows in the required order. That means the same index that is used for the where clause must also cover the order by clause.

As an example, consider the following query that selects yesterday’s sales ordered by sale data and product ID:

SELECT sale_date, product_id, quantity
  FROM sales
 WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
 ORDER BY sale_date, product_id;

There is already an index on SALE_DATE that can be used for the where clause. The database must, however, perform an explicit sort operation to satisfy the order by clause:

---------------------------------------------------------------
|Id | Operation                    | Name       | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT             |            |  320 |   18 |
| 1 |  SORT ORDER BY               |            |  320 |   18 |
| 2 |   TABLE ACCESS BY INDEX ROWID| SALES      |  320 |   17 |
|*3 |    INDEX RANGE SCAN          | SALES_DATE |  320 |    3 |
---------------------------------------------------------------

An INDEX RANGE SCAN delivers the result in index order anyway. To take advantage of this fact, we just have to extend the index definition so it corresponds to the order by clause:

  DROP INDEX sales_date;
CREATE INDEX sales_dt_pr ON sales (sale_date, product_id);

SELECT sale_date, product_id, quantity
  FROM sales
 WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
 ORDER BY sale_date, product_id;

---------------------------------------------------------------
|Id | Operation                   | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 | SELECT STATEMENT            |             |  320 |  300 |
| 1 |  TABLE ACCESS BY INDEX ROWID| SALES       |  320 |  300 |
|*2 |   INDEX RANGE SCAN          | SALES_DT_PR |  320 |    4 |
---------------------------------------------------------------

The sort operation SORT ORDER BY disappeared from the execution plan even though the query still has an order by clause. The database exploits the index order and skips the explicit sort operation.

Important

If the index order corresponds to the order by clause, the database can omit the explicit sort operation.

Even though the new execution plan has fewer operations, the cost value has increased considerably because the clustering factor of the new index is worse (see “Automatically Optimized Clustering Factor”). At this point, it should just be noted that the cost value is not always a good indicator of the execution effort.

Automatically Optimized Clustering Factor

The Oracle database keeps the clustering factor at a minimum by considering the ROWID for the index order. Whenever two index entries have the same key values, the ROWID decides upon their final order. The index is therefore also ordered according to the table order and thus has the smallest possible clustering factor because the ROWID represents the physical address of table row.

By adding another column to an index, you insert a new sort criterion before the ROWID. The database has less freedom in aligning the index entries according to the table order so the index clustering factor can only get worse.

Regardless, it is still possible that the index order roughly corresponds to the table order. The sales of a day are probably still clustered together in the table as well as in the index—even though their sequence is not exactly the same anymore. The database has to read the table blocks multiple times when using the SALE_DT_PR index—but these are just the same table blocks as before. Due to the caching of frequently accessed data, the performance impact could be considerably lower than indicated by the cost values.

For this optimization, it is sufficient that the scanned index range is sorted according to the order by clause. Thus the optimization also works for this particular example when sorting by PRODUCT_ID only:

SELECT sale_date, product_id, quantity
  FROM sales
 WHERE sale_date = TRUNC(sysdate) - INTERVAL '1' DAY
 ORDER BY product_id;

In Figure 6.1 we can see that the PRODUCT_ID is the only relevant sort criterion in the scanned index range. Hence the index order corresponds to the order by clause in this index range so that the database can omit the sort operation.

Figure 6.1. Sort Order in the Relevant Index Range


This optimization can cause unexpected behavior when extending the scanned index range:

SELECT sale_date, product_id, quantity
  FROM sales
 WHERE sale_date >= TRUNC(sysdate) - INTERVAL '1' DAY
 ORDER BY product_id;

This query does not retrieve yesterday’s sales but all sales since yesterday. That means it covers several days and scans an index range that is not exclusively sorted by the PRODUCT_ID. If we look at Figure 6.1 again and extend the scanned index range to the bottom, we can see that there are again smaller PRODUCT_ID values. The database must therefore use an explicit sort operation to satisfy the order by clause.

---------------------------------------------------------------
|Id |Operation                    | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT             |             |  320 |  301 |
| 1 | SORT ORDER BY               |             |  320 |  301 |
| 2 |  TABLE ACCESS BY INDEX ROWID| SALES       |  320 |  300 |
|*3 |   INDEX RANGE SCAN          | SALES_DT_PR |  320 |    4 |
---------------------------------------------------------------

If the database uses a sort operation even though you expected a pipelined execution, it can have two reasons: (1) the execution plan with the explicit sort operation has a better cost value; (2) the index order in the scanned index range does not correspond to the order by clause.

About our book “SQL Performance Explained”
Probably the best book on SQL performance I've read
Guillaume Lelarge on Amazon.co.uk (5 stars)

A simple way to tell the two cases apart is to use the full index definition in the order by clause—that means adjusting the query to the index in order to eliminate the second cause. If the database still uses an explicit sort operation, the optimizer prefers this plan due to its cost value; otherwise the database cannot use the index for the original order by clause.

Tweet this tip

Tip

Use the full index definition in the order by clause to find the reason for an explicit sort operation.

In both cases, you might wonder if and how you could possibly reach a pipelined order by execution. For this you can execute the query with the full index definition in the order by clause and inspect the result. You will often realize that you have a false perception of the index and that the index order is indeed not as required by the original order by clause so the database cannot use the index to avoid a sort operation.

If the optimizer prefers an explicit sort operation for its cost value, it is usually because the optimizer takes the best execution plan for the full execution of the query. In other words, the optimizer opts for the execution plan which is the fastest to get the last record. If the database detects that the application fetches only the first few rows, it might in turn prefer an indexed order by. Chapter 7, “Partial Results” , explains the corresponding optimization methods.

If you like my way of explaining things, you’ll love my book.

About the Author

Photo of Markus Winand
Markus Winand tunes developers for high SQL performance. He also published the book SQL Performance Explained and offers in-house training as well as remote coaching at http://winand.at/

?Recent questions at
Ask.Use-The-Index-Luke.com

3
votes
2
answers
286
views

pagination with nulls

2 days ago Markus Winand ♦♦ 771
pagination
2
votes
1
answer
1.9k
views
0
votes
2
answers
1.1k
views

different execution plans after failing over from primary to standby server

Sep 17 at 11:46 Markus Winand ♦♦ 771
oracle index update