by Markus Winand.

PostgreSQL Example Scripts for “3-Minute Test”


This section contains the create, insert and select statements for the “Test your SQL Know-How in 3 Minutes” test. You may want to test yourself before reading this page.

The execution plans shown are abbreviated for better readability.

Table Setup

CREATE TABLE tbl (
  id          NUMERIC NOT NULL,
  date_column DATE,
  a           NUMERIC,
  b           NUMERIC,
  text        VARCHAR(255),
  state       CHAR(1),
  PRIMARY KEY (id)
);

CREATE OR REPLACE FUNCTION random_string(minlen NUMERIC, maxlen NUMERIC)
RETURNS VARCHAR(1000)
AS
$$
DECLARE
  rv VARCHAR(1000) := '';
  i  INTEGER := 0;
  len INTEGER := 0;
BEGIN
  IF maxlen < 1 OR minlen < 1 OR maxlen < minlen THEN
    RETURN rv;
  END IF;

  len := floor(random()*(maxlen-minlen)) + minlen;

  FOR i IN 1..floor(len) LOOP
    rv := rv || chr(97+CAST(random() * 25 AS INTEGER));
  END LOOP;
  RETURN rv;
END;
$$ LANGUAGE plpgsql;

INSERT INTO tbl
SELECT generate_series
     , CURRENT_DATE - generate_series
     , MOD(generate_series,1234)
     , CAST(RANDOM() * 10 AS NUMERIC), random_string(20, 20)
     , CASE MOD(generate_series,5) WHEN 0 THEN 'X' ELSE 'A' END
FROM generate_series(1, 50000);

--VACUUM ANALYZE tbl;

Question 1 — DATE Anit-Pattern

CREATE INDEX tbl_idx ON tbl (date_column);
SELECT COUNT(*)
  FROM tbl
 WHERE EXTRACT(YEAR FROM date_column) = 2016;
SELECT COUNT(*)
  FROM tbl
 WHERE date_column >= DATE'2016-01-01'
   AND date_column <  DATE'2017-01-01';

The first execution plan performs a full table scan (Seq Scan). The second execution plan, on the other hand, uses the index.

                             QUERY PLAN
--------------------------------------------------------------------
 Aggregate (actual time=13.842..13.843 rows=1)
   Buffers: shared hit=516
   ->  Seq Scan on tbl  (actual time=0.070..13.807 rows=366)
         Filter: (date_part('year', date_column) = 2016)
         Rows Removed by Filter: 49634
         Buffers: shared hit=516


                                 QUERY PLAN
-------------------------------------------------------------------------------
 Aggregate  (actual time=0.109..0.109 rows=1)
   Buffers: shared hit=7
   ->  Index Only Scan using tbl_idx on tbl (actual time=0.019..0.080 rows=366)
         Index Cond: ((date_column >= '2016-01-01')
                  AND (date_column <  '2017-01-01'))
         Heap Fetches: 366
         Buffers: shared hit=7

Question 2 — Indexed Top-N

CREATE INDEX tbl_idx ON tbl (a, date_column);
SELECT *
  FROM tbl
 WHERE a = 123
 ORDER BY date_column DESC
 LIMIT 1;

The query uses the index (Index Scan) and fetches in reverse order (Backward). Note that there is no sort operation.

                  QUERY PLAN
----------------------------------------------
Limit  (cost=0.00..4.03 rows=1 width=12)
  ->  Index Scan Backward using tbl_idx on tbl
        (cost=0.00..164.96 rows=41)
        Index Cond: (a = ($1)::numeric)

Question 3 — Column Order

CREATE INDEX tbl_idx ON tbl (a, b);
SELECT *
  FROM tbl
 WHERE a = 123
   AND b = 1;
SELECT *
  FROM tbl
 WHERE b = 123;

The second query reads the entire table (Seq Scan). Changing the column order in the index allows both queries to use a (Bitmap) Index Scan.

                         QUERY PLAN
-------------------------------------------------------------
Bitmap Heap Scan on tbl  (cost=4.30..19.32 rows=4)
 Recheck Cond: ((a = ($1)::numeric) AND (b = ($2)::numeric))
 -> Bitmap Index Scan on tbl_idx  (cost=0.00..4.30 rows=4)
    Index Cond: ((a = ($1)::numeric) AND (b = ($2)::numeric))

                        QUERY PLAN
----------------------------------------------------------
Seq Scan on tbl  (cost=0.00..1261.00 rows=4545 width=20)
  Filter: (b = ($1)::numeric)

Question 4 — LIKE

CREATE INDEX tbl_idx ON tbl (text);
SELECT *
  FROM tbl
 WHERE text LIKE '%TERM%';

The query reads the entire table (Seq Scan).

                      QUERY PLAN
-------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..1136.00 rows=5 width=29)
   Filter: ((text)::text ~~ '%TERM%'::text)

Question 5 — Index Only Scan

CREATE INDEX tbl_idx ON tbl (a, date_column);
SELECT date_column, count(*)
  FROM tbl
 WHERE a = 123
 GROUP BY date_column;
SELECT date_column, count(*)
  FROM tbl
 WHERE a = 123
   AND b = 1
 GROUP BY date_column;

Although the index is used efficiently in both queries, the second query does perform an Index Only Scan. That means the second query performs less efficient as the first.

                                  QUERY PLAN
-------------------------------------------------------------------------------
 GroupAggregate  (cost=0.29..5.59 rows=40 width=12)
   Group Key: date_column
   ->  Index Only Scan using tbl_idx on tbl  (cost=0.29..4.99 rows=40 width=4)
         Index Cond: (a = '123'::numeric)

                                    QUERY PLAN
-----------------------------------------------------------------------------------
 GroupAggregate  (cost=129.03..129.05 rows=1 width=12)
   Group Key: date_column
   ->  Sort  (cost=129.03..129.04 rows=1 width=4)
         Sort Key: date_column
         ->  Bitmap Heap Scan on tbl  (cost=4.59..129.02 rows=1 width=4)
               Recheck Cond: (a = '123'::numeric)
               Filter: (b = '1'::numeric)
               ->  Bitmap Index Scan on tbl_idx  (cost=0.00..4.59 rows=40 width=0)
                     Index Cond: (a = '123'::numeric)

Old Question 5 — Range Condition

This sample is a little bit harder to proof, since the PostgreSQL's cost estimation doesn't reflect the performance difference correctly. So, we compare actual execution performance using EXPLAIN ANALYZE.

CREATE INDEX tbl_idx ON tbl (date_column, state);
CLUSTER tbl_idx ON tbl;
VACUUM ANALYZE tbl;

EXPLAIN ANALYZE
 SELECT id, date_column, state
   FROM tbl
  WHERE date_column >= current_date - INTERVAL '5' YEAR
    AND state = 'X';

DROP INDEX tbl_idx;
CREATE INDEX tbl_idx ON tbl (state, date_column);
CLUSTER tbl_idx ON tbl;
VACUUM ANALYZE tbl;

EXPLAIN ANALYZE
 SELECT id, date_column, state
   FROM tbl
  WHERE date_column >= current_date - INTERVAL '5' YEAR
    AND state = 'X';

DROP INDEX tbl_idx;

Both indexes are used, of course. The difference is only the execution time. The performance difference becomes bigger, if selecting a longer time period.

                        QUERY PLAN
------------------------------------------------------------
Index Scan using tbl_idx (actual time=0.071..0.764 rows=365)
Index Cond: (
 (date_column >= ((now())::date - '5 years'::interval year))
AND
 (state = 'X'::bpchar))
Total runtime: 0.842 ms

                        QUERY PLAN
-------------------------------------------------------------
Index Scan using tbl_idx (actual time=0.092..0.419 rows=365)
Index Cond: (
 (state = 'X'::bpchar)
AND
 (date_column >= ((now())::date - '5 years'::interval year)))
Total runtime: 0.520 ms

About the Author

Photo of Markus Winand

Markus Winand teaches efficient SQL—inhouse and online. He minimizes the development time using modern SQL and optimizes the runtime with smart indexing. His book entitled SQL Performance Explained has become standard reading.

Buy his Book on Amazon

Cover of “SQL Performance Explained”: Squirrel running on grass

The essence of SQL tuning in 200 pages

Buy on Amazon
(paperback only)

Paperback and PDF also available at Markus’ store.

Hire Markus

…to answer your current SQL questions.

The quick and easy way to benefit from his extensive knowledge and experience.
Learn more »

“Use The Index, Luke!” by Markus Winand is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 Unported License.
Legal | Contact | NO WARRANTY | Trademarks | Privacy | CC-BY-NC-ND 3.0 license