MySQL Row Generator


A row generator is a method to generate numbered rows on demand. It is possible to build a row generator with pure standard SQL—using recursive common table expressions (CTE), the with clause. If you never heard of the with clause, it’s probably because MySQL doesn’t implement that specific part of the SQL-99 standard (feature request from 2006). This article introduces generator views for MySQL. Not as powerful as recursive CTEs, but good enough in most cases. But before going into the implementation details, I’ll show you a use case for a row generator.

Row generators are useful to fill gaps in results. Consider the following query:

SELECT COUNT(*), sale_date
  FROM sales
 WHERE sale_date > CURDATE() - INTERVAL 1 MONTH
 GROUP BY sale_date
 ORDER BY sale_date;

The result will not contain rows for days where no sales record exists at all. You can complete the result with a row generator.

Imagine a view, GENERATOR, that returns numbered rows like this:

SELECT n
  FROM generator
 WHERE n < 31;

+-----+
|  n  |
+-----+
|   0 | 
|   1 | 
. . . .
|  29 | 
|  30 | 
+-----+
31 rows in set (0.00 sec)

This is the basic functionality of a row generator. Having that, it is quite simple to list all days since a month ago:

SELECT CURDATE() - INTERVAL n DAY dt
  FROM generator
 WHERE CURDATE() - INTERVAL n DAY 
     > CURDATE() - INTERVAL 1 MONTH;

+------------+
| dt         |
+------------+
| 2011-07-29 | 
| 2011-07-28 | 
. . . . . . . 
| 2011-07-01 |
| 2011-06-30 | 
+------------+
30 rows in set (0.00 sec)

Finally, we can use an outer join to combine the original result with the generated dates:

SELECT IFNULL(sales, 0) sales, dt sale_date
  FROM
     ( SELECT CURDATE() - INTERVAL n DAY dt
         FROM generator
        WHERE CURDATE() - INTERVAL n DAY 
            > CURDATE() - INTERVAL 1 MONTH
     ) dates
  LEFT OUTER JOIN
     ( SELECT COUNT(*) sales, sale_date
         FROM sales
        WHERE sale_date > CURDATE() - INTERVAL 1 MONTH
        GROUP BY sale_date
     ) sales
    ON (sales.sale_date = dates.dt)
 ORDER BY dt;

The left side of the join has all the generated dates so that the outer join pads the right wide with NULL, if there were no sale on that day. The IFNULL turns the missing sales count into a zero.

There is something for everyone:
training, tuning and literature on SQL performance

So far, so good. But the problem is that MySQL has no row generator that produces an arbitrary number of rows as needed. Still there is a technique that is good enough in most cases.

It starts with something rather ridiculous:

CREATE OR REPLACE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1  UNION ALL SELECT 2  UNION ALL 
   SELECT 3   UNION ALL SELECT 4  UNION ALL SELECT 5  UNION ALL
   SELECT 6   UNION ALL SELECT 7  UNION ALL SELECT 8  UNION ALL
   SELECT 9   UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
   SELECT 12  UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL 
   SELECT 15;

It is a generator to produces 16 rows. You can use the same technique for larger generators as well. E.g., to produce one million rows, like shown at the end of this article. Just kidding. There is, of course, a better way:

CREATE OR REPLACE VIEW generator_256
AS SELECT ( hi.n * 16 + lo.n ) AS n
     FROM generator_16 lo
        , generator_16 hi;

This view builds the cross join (Cartesian product) of the previous view with itself. That means it pairs every row from the "lo" side with all rows from the "hi" side of the join. The result has 265 rows (16x16).

Considering the introductory example, it is not only important to generate an arbitrary number of rows, we need numbered rows to make use of it. For that purpose, I threat the two sides of the cross join like digits in a hexadecimal number. The "lo" side building the least significant, the "hi" side the most significant digit. Explained by SQL:

SELECT CONCAT(LPAD(hi.n,2,' '), ' * 16 + '
            , LPAD(lo.n,2,' '), ' = '
            , LPAD(hi.n*16+lo.n, 3,' '))
              AS "HI        LO   SEQ"
  FROM generator_16 lo, generator_16 hi;

+--------------------+
| HI        LO   SEQ |
+--------------------+
|  0 * 16 +  0 =   0 | 
|  0 * 16 +  1 =   1 | 
|  0 * 16 +  2 =   2 | 
|  0 * 16 +  3 =   3 | 
|  0 * 16 +  4 =   4 | 
|  0 * 16 +  5 =   5 | 
|  0 * 16 +  6 =   6 | 
|  0 * 16 +  7 =   7 | 
|  0 * 16 +  8 =   8 | 
|  0 * 16 +  9 =   9 | 
|  0 * 16 + 10 =  10 | 
|  0 * 16 + 11 =  11 | 
|  0 * 16 + 12 =  12 | 
|  0 * 16 + 13 =  13 | 
|  0 * 16 + 14 =  14 | 
|  0 * 16 + 15 =  15 | 
|  1 * 16 +  0 =  16 | 
|  1 * 16 +  1 =  17 | 
. . . . . . . . . . .
| 15 * 16 + 14 = 254 | 
| 15 * 16 + 15 = 255 | 
+--------------------+
256 rows in set (0.00 sec)

I guess you know how to build a larger generator now?

There are, of course, some limitations:

  • The views are bounded

    They cannot produce an arbitrary number of rows. But you can prepare views that generate large number of rows (see below).

  • The views will always build the full Cartesian product

    Although you can limit the result with the where clause, the work to produce all rows is still done. It’s just filtering the unneeded. That means, it is very inefficient to use a large generator view if you need just a few rows.

Another aspect is that the result order is undefined—although the example above produces the numbers is ascending order. That’s just because the way MySQL joins the two views. If you build another meta view, e.g., GENERATOR_64K, you’ll note that the order is not ascending anymore—because of MySQLs Block Nested-Loop Join. But that doesn’t mean that the generator doesn’t work, each number is still generated exactly once. If you need a particular order, just use the ORDER BY clause on the outermost select.

Warning

Don’t use LIMIT to cut the view as needed because you might receive unexpected numbering. Please note that using ORDER BY and LIMIT in combination returns the correct result, but requires the intermediate result to be stored temporarily.

Use where to filter on the returned numbers. There is no need to store the intermediate result in that case.

Finally, here are some handy generator views up to 1 "mega-row". They use a minor improvement: bit-shift operations instead of arithmetics.

CREATE OR REPLACE VIEW generator_16
AS SELECT 0 n UNION ALL SELECT 1  UNION ALL SELECT 2  UNION ALL 
   SELECT 3   UNION ALL SELECT 4  UNION ALL SELECT 5  UNION ALL
   SELECT 6   UNION ALL SELECT 7  UNION ALL SELECT 8  UNION ALL
   SELECT 9   UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL
   SELECT 12  UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL 
   SELECT 15;

CREATE OR REPLACE VIEW generator_256
AS SELECT ( ( hi.n << 4 ) | lo.n ) AS n
     FROM generator_16 lo, generator_16 hi;

CREATE OR REPLACE VIEW generator_4k
AS SELECT ( ( hi.n << 8 ) | lo.n ) AS n
     FROM generator_256 lo, generator_16 hi;

CREATE OR REPLACE VIEW generator_64k
AS SELECT ( ( hi.n << 8 ) | lo.n ) AS n
     FROM generator_256 lo, generator_256 hi;

CREATE OR REPLACE VIEW generator_1m
AS SELECT ( ( hi.n << 16 ) | lo.n ) AS n
     FROM generator_64k lo, generator_16 hi;

Even the last view, producing 220 rows, returns the result in a seconds on current hardware. However, the only use I have for the GENERATOR_1M view is producing test data.

Always use the smallest possible generator for best performance.

If you like my way to explain things, you’ll love SQL Performance Explained.

About the Author

Photo of Markus Winand
Markus Winand tunes developers for high SQL performance. He also published the book SQL Performance Explained and offers in-house training as well as remote coaching at http://winand.at/

?Recent questions at
Ask.Use-The-Index-Luke.com

0
votes
2
answers
731
views

different execution plans after failing over from primary to standby server

yesterday Markus Winand ♦♦ 741
oracle index update
1
vote
1
answer
65
views

Generate test data for a given case

Sep 14 at 18:11 Markus Winand ♦♦ 741
testcase postgres
0
votes
1
answer
213
views

Database design suggestions for a data scraping/warehouse application?

Aug 27 at 09:29 Markus Winand ♦♦ 741
mysql optimization database