PostgreSQL Example Scripts for “3-Minuten Test”


This section contains the create, insert and select statements for the “Test your SQL Know-How in 3 Minutes” test. You may want to test yourself before reading this page.

The execution plans shown are abbreviated for better readability.

Table Setup

CREATE TABLE tbl (
  id          NUMERIC NOT NULL,
  date_column DATE,
  a           NUMERIC,
  b           NUMERIC,
  text        VARCHAR(255),
  state       CHAR(1),
  PRIMARY KEY (id)
);


CREATE OR REPLACE FUNCTION random_string(len INTEGER)
RETURNS VARCHAR(1000)
AS
$$
DECLARE
  rv VARCHAR(1000) := '';
  i  INTEGER := 0;
BEGIN
  IF len < 1 THEN
    RETURN rv;
  END IF;

  FOR i IN 1..len LOOP
    rv := rv || chr(97+(random() * 25)::int);
  END LOOP;
  RETURN rv;
END;
$$ LANGUAGE plpgsql;


INSERT INTO tbl
SELECT generate_series
     , CURRENT_DATE - generate_series
     , MOD(generate_series,1234)
     , (RANDOM() * 10)::INT, random_string(20)
     , CASE MOD(generate_series,5) WHEN 0 THEN 'X' ELSE 'A' END
FROM generate_series(1, 50000);

ANALYZE tbl;

Question 1 — DATE Anit-Pattern

CREATE INDEX tbl_idx ON tbl (date_column);


EXPLAIN
 SELECT text, date_column
   FROM tbl
  WHERE TO_CHAR(date_column, 'YYYY') = '2011';

EXPLAIN
 SELECT text, date_column
   FROM tbl
  WHERE date_column >= TO_DATE('2011-01-01', 'YYYY-MM-DD')
    AND date_column <  TO_DATE('2012-01-01', 'YYYY-MM-DD');


DROP INDEX tbl_idx;

The first execution plan performs a full table scan (Seq Scan). The second execution plan, on the other hand, performs an index range scan (Index Scan).

                           QUERY PLAN
---------------------------------------------------------------
Seq Scan on tbl  (cost=0.00..1386.00 rows=250)
  Filter: (to_char((date_column), 'YYYY'::text) = '2011'::text)


                           QUERY PLAN
---------------------------------------------------------------
Index Scan using tbl_idx on tbl  (cost=0.01..14.36 rows=205)
Index Cond: ((
 date_column >= to_date('2011-01-01'::text, 'YYYY-MM-DD'::text)
) AND (
 date_column <  to_date('2012-01-01'::text, 'YYYY-MM-DD'::text)
))

Question 2 — Indexed Top-N

CREATE INDEX tbl_idx ON tbl (a, date_column);

PREPARE STMT(int) AS
 SELECT id, date_column
   FROM tbl
  WHERE a = $1
  ORDER BY date_column DESC
  LIMIT 1;

EXPLAIN EXECUTE STMT(123);
DEALLOCATE STMT;

DROP INDEX tbl_idx;

The query uses the index (Index Scan) and fetches in reverse order (Backward). Note that there is no sort operation.

                  QUERY PLAN
----------------------------------------------
Limit  (cost=0.00..4.03 rows=1 width=12)
  ->  Index Scan Backward using tbl_idx on tbl
        (cost=0.00..164.96 rows=41)
        Index Cond: (a = ($1)::numeric)

Question 3 — Column Order

CREATE INDEX tbl_idx ON tbl (a, b);

PREPARE STMT(int, int) AS
 SELECT id, a, b
   FROM tbl
  WHERE a = $1
    AND b = $2;

EXPLAIN EXECUTE STMT(123, 1);
DEALLOCATE STMT;

PREPARE STMT(int) AS
 SELECT id, a, b
   FROM tbl
  WHERE b = $1;

EXPLAIN EXECUTE STMT(1);
DEALLOCATE STMT;

DROP INDEX tbl_idx;

The second query reads the entire table (Seq Scan). Changing the column order in the index allows both queries to use a (Bitmap) Index Scan.

                         QUERY PLAN
-------------------------------------------------------------
Bitmap Heap Scan on tbl  (cost=4.30..19.32 rows=4)
 Recheck Cond: ((a = ($1)::numeric) AND (b = ($2)::numeric))
 -> Bitmap Index Scan on tbl_idx  (cost=0.00..4.30 rows=4)
    Index Cond: ((a = ($1)::numeric) AND (b = ($2)::numeric))

                        QUERY PLAN
----------------------------------------------------------
Seq Scan on tbl  (cost=0.00..1261.00 rows=4545 width=20)
  Filter: (b = ($1)::numeric)

Question 4 — LIKE

CREATE INDEX tbl_idx ON tbl (text varchar_pattern_ops);

EXPLAIN
 SELECT id, text
   FROM tbl
  WHERE text LIKE '%TERM%';

DROP INDEX tbl_idx;

The query reads the entire table (Seq Scan).

                      QUERY PLAN
-------------------------------------------------------
 Seq Scan on tbl  (cost=0.00..1136.00 rows=5 width=29)
   Filter: ((text)::text ~~ '%TERM%'::text)

Question 5 — Range Condition

This sample is a little bit harder to proof, since the PostgreSQL’s cost estimation doesn’t reflect the performance difference correctly. So, we compare actual execution performance using EXPLAIN ANALYZE.

CREATE INDEX tbl_idx ON tbl (date_column, state);
CLUSTER tbl_idx ON tbl;
ANALYZE tbl;

EXPLAIN ANALYZE
 SELECT id, date_column, state
   FROM tbl
  WHERE date_column >= now()::date - INTERVAL '5' YEAR
    AND state = 'X';

DROP INDEX tbl_idx;
CREATE INDEX tbl_idx ON tbl (state, date_column);
CLUSTER tbl_idx ON tbl;
ANALYZE tbl;

EXPLAIN ANALYZE
 SELECT id, date_column, state
   FROM tbl
  WHERE date_column >= now()::date - INTERVAL '5' YEAR
    AND state = 'X';

DROP INDEX tbl_idx;

Both indexes are used, of course. The difference is only the execution time. The performance difference becomes bigger, if selecting a longer time period.

                        QUERY PLAN
------------------------------------------------------------
Index Scan using tbl_idx (actual time=0.071..0.764 rows=365)
Index Cond: (
 (date_column >= ((now())::date - '5 years'::interval year))
AND
 (state = 'X'::bpchar))
Total runtime: 0.842 ms

                        QUERY PLAN
-------------------------------------------------------------
Index Scan using tbl_idx (actual time=0.092..0.419 rows=365)
Index Cond: (
 (state = 'X'::bpchar)
AND
 (date_column >= ((now())::date - '5 years'::interval year)))
Total runtime: 0.520 ms

About the Author

As an author, trainer, and coach Markus Winand specializes in helping developers cope with SQL performance issues. He also published the book SQL Performance Explained and tweets his best performance tips via @SQLPerfTips.http://winand.at/

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
101
views
0
votes
0
answers
349
views

Fanout in R-Tree

Mar 27 at 08:07 jamie 1
tree indexing
0
votes
1
answer
132
views

Think About It

Mar 26 at 12:54 Markus Winand ♦♦ 511
reflection