PostgreSQL Example Scripts for "The Where Clause"


The scripts provided in this appendix are ready to run and were tested on the PostgreSQL 9. Most of the examples will also work earlier releases.

The Equals Operator

Surrogate Keys

The following script creates the EMPLOYEES table with 1000 entries.

CREATE TABLE employees (
   employee_id   NUMERIC       NOT NULL,
   first_name    VARCHAR(1000) NOT NULL,
   last_name     VARCHAR(1000) NOT NULL,
   date_of_birth DATE                   ,
   phone_number  VARCHAR(1000) NOT NULL,
   junk          CHAR(1000)             ,
   CONSTRAINT employees_pk PRIMARY KEY (employee_id)
);

CREATE FUNCTION random_string(len INTEGER)
RETURNS VARCHAR(1000)
AS
$$
DECLARE
  rv VARCHAR(1000) := '';
  i  INTEGER := 0;
BEGIN
  IF len < 1 THEN
    RETURN rv;
  END IF;

  FOR i IN 1..len LOOP
    rv := rv || chr(97+(random() * 25)::int);
  END LOOP;
  RETURN rv;
END;
$$ LANGUAGE plpgsql;


INSERT INTO employees (employee_id,  first_name,
                       last_name,    date_of_birth, 
                       phone_number, junk)
SELECT GENERATE_SERIES
     , initcap(lower(random_string(2+(random()*8)::int)))
     , initcap(lower(random_string(2+(random()*8)::int)))
     , CURRENT_DATE - (random() * 365 * 10)::int - 40 * 365
     , (random() * 9000 + 1000)::int
     , 'junk'
  FROM GENERATE_SERIES(1, 1000); 


UPDATE employees 
   SET first_name='MARKUS', 
       last_name='WINAND'
 WHERE employee_id=123;

ANALYZE employees;

Notes:

  • The JUNK column is used to have a realistic row length. Because it’s data type is CHAR, as opposed to VARCHAR, it always needs the 1000 bytes it can hold. Without this column the table would become unrealistically small and many demonstrations would not work.

  • Random data is filled into the table, with exception to my entry, that is updated after the insert.

  • Table statistics are gathered so that the optimizer knows a little bit about the table’s content.

Concatenated Keys

This script changes the EMPLOYEES table so that it reflects the situation after the merger with Very Big Company:

-- add subsidiary_id and update existing records
ALTER TABLE employees ADD subsidiary_id NUMERIC;
UPDATE      employees SET subsidiary_id = 30;
ALTER TABLE employees ALTER COLUMN subsidiary_id SET NOT NULL;

-- change the PK
ALTER TABLE employees DROP CONSTRAINT employees_pk;
ALTER TABLE employees ADD CONSTRAINT employees_pk 
      PRIMARY KEY (employee_id, subsidiary_id);

-- generate more records (Very Big Company)
INSERT INTO employees (employee_id,  first_name,
                       last_name,    date_of_birth, 
                       phone_number, subsidiary_id, junk)
SELECT GENERATE_SERIES
     , initcap(lower(random_string(2+(random()*8)::int)))
     , initcap(lower(random_string(2+(random()*8)::int)))
     , CURRENT_DATE - (random() * 365 * 10)::int - 40 * 365
     , (random() * 9000 + 1000)::int
     , (random() * least((generate_series % 2)+0.2,1) * (generate_series-1000)/9000*29)::int
     , 'junk'
  FROM GENERATE_SERIES(1001, 10000);

ANALYZE employees;

Notes:

  • The new primary key just extended by the SUBSIDIARY_ID; that is, the EMPLOYEE_ID remains in the first position.

  • The new records are randomly assigned to the subsidiaries 1 through 29.

  • The table and index are analyzed again to make the optimizer aware of the grown data volume.

The next script introduces the index on SUBSIDIARY_ID to support the query for all employees of one particular subsidiary:

CREATE INDEX emp_sub_id ON employees (subsidiary_id);

Although that gives decent performance, it’s better to use the index that supports the primary key:

-- use tmp index to support the PK
CREATE UNIQUE INDEX employee_pk_tmp 
    ON employees (subsidiary_id, employee_id);

 ALTER TABLE employees 
   ADD CONSTRAINT employees_pk_tmp
UNIQUE (subsidiary_id, employee_id);

ALTER TABLE employees
 DROP CONSTRAINT employees_pk;

ALTER TABLE employees
  ADD CONSTRAINT employees_pk
      PRIMARY KEY (subsidiary_id, employee_id);

ALTER TABLE employees
 DROP CONSTRAINT employees_pk_tmp;

-- drop old indexes
DROP INDEX employee_pk_tmp;
DROP INDEX emp_sub_id;

Notes:

  • A new index is created and used to support the PK.

  • Once the old PK index isn’t used by the constraint anymore, it can be dropped and recreated with its new column order.

  • The constraint is changed again to use the new PK index and the temporary index can be dropped—as well as the index on the subsidiary id that isn’t required anymore.

Functions

Case-Insensitive Search

The randomized names were already created in correct case, just update “my” record:

UPDATE employees 
   SET first_name = 'Markus'
     , last_name  = 'Winand'
 WHERE employee_id   = 123
   AND subsidiary_id = 30;

The statement to create the function-based index:

CREATE INDEX emp_up_name
    ON employees (UPPER(last_name) varchar_pattern_ops);
DROP INDEX emp_name;
ANALYZE employees;

User-Defined Functions

Define a PL/SQL function that calculates the age and attempts to use it in an index:

CREATE FUNCTION get_age(date_of_birth DATE) 
RETURNS NUMERIC
AS
$$
BEGIN
    RETURN DATE_PART('year', AGE(date_of_birth));
END;
$$ LANGUAGE plpgsql;

CREATE INDEX invalid ON EMPLOYEES (get_age(date_of_birth));

You should get the error "functions in index expression must be marked IMMUTABLE".

Partial Indexes

CREATE TABLE messages AS (
SELECT GENERATE_SERIES::numeric id
     , CASE WHEN random() < 0.01 THEN 'N' ELSE 'Y' END processed
     , trunc(random() * 100)::numeric receiver
     , 'junk' message
  FROM GENERATE_SERIES(0, 999999)
);


CREATE INDEX messages_todo
          ON messages (receiver, processed);
PREPARE stmt(int) AS
 SELECT message
   FROM messages
  WHERE processed = 'N'
    AND receiver  = $1;


EXPLAIN EXECUTE stmt(1);
                                  QUERY PLAN                                  
------------------------------------------------------------------------------
 Bitmap Heap Scan on messages  (cost=4.66..100.58 rows=25 width=32)
   Recheck Cond: ((receiver = ($1)::numeric) AND (processed = 'N'::text))
   ->  Bitmap Index Scan on messages_todo  (cost=0.00..4.66 rows=25 width=0)
         Index Cond: ((receiver = ($1)::numeric) AND (processed = 'N'::text))
CREATE INDEX messages_only_todo
          ON messages (receiver)
       WHERE processed = 'N';

PREPARE stmt(int) AS
 SELECT message
   FROM messages
  WHERE processed = 'N'
    AND receiver  = $1;


EXPLAIN EXECUTE stmt(1);
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Index Scan using messages_only_todo on messages  (cost=0.00..8.27 rows=1 width=32)
   Index Cond: (receiver = ($1)::numeric)

About the Author

As an author, trainer, and coach Markus Winand specializes in helping developers cope with SQL performance issues. He also published the book SQL Performance Explained and tweets his best performance tips via @SQLPerfTips.http://winand.at/

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
55
views

We want to buy the book but I can't

Jul 18 at 21:36 Markus Winand ♦♦ 541
book
0
votes
2
answers
117
views
0
votes
0
answers
779
views

Performance very bad in Postgresql 9.3

Jul 08 at 11:54 Markus Winand ♦♦ 541
performance issue