by Markus Winand.

How Much do You Know About PostgreSQL Indexing?


Most questions ask whether the index is the right one for the query (good fit) or not (bad fit).

If you think about changing a query to improve performance, the new query must still return the same result. That includes all columns if select * is used.

Before I Show You the Result…

Please submit your answers for analysis

I'd appreciate if you share your results with me for analysis (like in this blog post).

(Privacy)

Question 5

This question is different. First consider the following index and query:

CREATE INDEX tbl_idx ON tbl (a, date_column)
SELECT date_column, count(*)
  FROM tbl
 WHERE a = 38
 GROUP BY date_column

Let's say this query returns at least a few rows.

To implement a new functional requirement, another condition (b = 1) is added to the where clause:

SELECT date_column, count(*)
  FROM tbl
 WHERE a = 38
   AND b = 1
 GROUP BY date_column

How will the change affect performance:

  • Wrong! The query will be slower.

    The index happened to have all required data (columns) for the original query. It can run as so-called index-only scan, which doesn't need to access the actual table at all.

    Accessing any column that is not part of the index prevents this optimization so that the database must look into the actual table for each row that qualifies the original where clause to see if it also satisfies the new filter. Even if the new filter removes all rows, it does so after incurring additional work. Although the grouping has fewer rows to aggregate, this cannot compensate for the additional table look-ups.

    Tip

    Use an index-only scan for queries that access many rows but only a few columns.

    Avoid select * to increase chances for an index-only scan.

    See also: Index-Only Scan and Myth: select * is bad

  • Wrong. The provided information is sufficient to tell that the query will be slower.

    The index happened to have all required data (columns) for the original query. It can run as so-called index-only scan, which doesn't need to access the actual table at all.

    Accessing any column that is not part of the index prevents this optimization so that the database must look into the actual table for each row that qualifies the original where clause to see if it also satisfies the new filter. Even if the new filter removes all rows, it does so after incurring additional work. Although the grouping has fewer rows to aggregate, this cannot compensate for the additional table look-ups.

    Tip

    Use an index-only scan for queries that access many rows but only a few columns.

    Avoid select * to increase chances for an index-only scan.

    See also: Index-Only Scan and Myth: select * is bad

  • Right: the query will be slower.

    The index happened to have all required data (columns) for the original query. It can run as so-called index-only scan, which doesn't need to access the actual table at all.

    Accessing any column that is not part of the index prevents this optimization so that the database must look into the actual table for each row that qualifies the original where clause to see if it also satisfies the new filter. Even if the new filter removes all rows, it does so after incurring additional work. Although the grouping has fewer rows to aggregate, this cannot compensate for the additional table look-ups.

    Tip

    Use an index-only scan for queries that access many rows but only a few columns.

    Avoid select * to increase chances for an index-only scan.

    See also: Index-Only Scan and Myth: select * is bad

  • Wrong: the query will be slower!

    The index happened to have all required data (columns) for the original query. It can run as so-called index-only scan, which doesn't need to access the actual table at all.

    Accessing any column that is not part of the index prevents this optimization so that the database must look into the actual table for each row that qualifies the original where clause to see if it also satisfies the new filter. Even if the new filter removes all rows, it does so after incurring additional work. Although the grouping has fewer rows to aggregate, this cannot compensate for the additional table look-ups.

    Tip

    Use an index-only scan for queries that access many rows but only a few columns.

    Avoid select * to increase chances for an index-only scan.

    See also: Index-Only Scan and Myth: select * is bad

Question 4

Is the following index a good fit for the query?

CREATE INDEX tbl_idx ON tbl (text varchar_pattern_ops)
SELECT *
  FROM tbl
 WHERE text LIKE 'TJ%'

Your answer:

  • Although like expressions starting with a wild card character (% or _) cannot use this index efficiently, a pattern that has the wild card character at the very end can! Even if the wild card character is in the middle, the index is still useful.

    See also: A visual explanation when SQL's like is slow

  • Although like expressions starting with a wild card character (% or _) cannot use this index efficiently, a pattern that has the wild card character at the very end can! Even if the wild card character is in the middle, the index is still useful.

    See also: A visual explanation when SQL's like is slow

Question 3

Is the following index a good fit for both queries?

CREATE INDEX tbl_idx ON tbl (a, b)
SELECT *
  FROM tbl
 WHERE a = 38
   AND b = 1
SELECT *
  FROM tbl
 WHERE b = 1

Your answer:

  • The index covers the first query only, the second query cannot use the index efficiently.

    Note that the database could still read the full index end to end. Although this can be faster than reading the full table end to end, it is still not very efficient and considered not solution to this problem.

    Changing the index column order makes the index suitable for both queries—without additional overhead. The index should therefore look like this (columns exchanged):

    CREATE INDEX tbl_idx ON tbl (b, a)

    Tip

    Indexes can only be used from left to right side. If the first index column is not in the where clause, the index is of little help.

    Learn more about multi-column indexes.

  • The index covers the first query only, the second query cannot use the index efficiently.

    Note that the database could still read the full index end to end. Although this can be faster than reading the full table end to end, it is still not very efficient and considered not solution to this problem.

    Changing the index column order makes the index suitable for both queries—without additional overhead. The index should therefore look like this (columns exchanged):

    CREATE INDEX tbl_idx ON tbl (b, a)

    Tip

    Indexes can only be used from left to right side. If the first index column is not in the where clause, the index is of little help.

    Learn more about multi-column indexes.

Question 2

Is the following index a good fit for the query?

CREATE INDEX tbl_idx ON tbl (a, date_column)
SELECT *
  FROM tbl
 WHERE a = 12
 ORDER BY date_column DESC
 FETCH FIRST 1 ROW ONLY

Your answer:

Question 1

Is the following index a good fit for the query?

CREATE INDEX tbl_idx ON tbl (date_column)
SELECT COUNT(*)
  FROM tbl
 WHERE EXTRACT(YEAR FROM date_column) = 2017

Your answer:

Your Score

You got ${RESULT} of 5 right.

You should never make these mistakes again:

  • Unnecessarily using functions on indexed columns in the where clause

  • Not taking advantage of indexed top-N queries

  • Choosing a poor column order in multi-column indexes

  • Inefficient use of like filters

  • Not using index-only scans

Please find the explanation to each issue below. If you don't trust them, download the example data and inspect the execution plans on your own.

On my Own Behalf

If you like this quiz, you might also like my book SQL Performance Explained or my Training.

About the Author

Photo of Markus Winand

Markus Winand teaches efficient SQL—inhouse and online. He minimizes the development time using modern SQL and optimizes the runtime with smart indexing. His book entitled SQL Performance Explained has become standard reading.

Buy his Book on Amazon

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy on Amazon
(paperback only)

Paperback and PDF also available at Markus’ store.

Hire Markus

…to answer your current SQL questions.

The quick and easy way to benefit from his extensive knowledge and experience.
Learn more »

“Use The Index, Luke!” by Markus Winand is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.
Legal | Contact | NO WARRANTY | Trademarks | Privacy | CC-BY-NC-ND 3.0 license