- About Optimizer Hints
- August 12–22: Online Training (EU shift)
- Clustered Indexes Dirty Secret
- FOSDEM Impressions
- June 8–18: Online Training (US shift)
- MySQL is to SQL like ??? to NoSQL
- Non-monetary micro sponsoring
- November 11-12 in Frankfurt am Main
- Online-Training in July and August
- Oracle + PostgreSQL
- Oracle Bootcamp
- PDF Registration
- Party time
- PostgreSQL Performance Event
- Quiz Results
- SQL Server Performance Kurs in Stuttgart
- SQL Server performance training in London
- Shipping Terms
- Springtime for SQL
- The two top performance problems caused by ORM tools
- Top Tweets January 2013
- Training Survey
- Training and Conference Dates
- Use The Index, Luke
2011-07-30MySQL Row Generator
A row generator is a method to generate numbered rows on demand. It is possible to build a row generator with pure standard SQL—using recursive common table expressions (CTE), the with clause. If you never heard of the with clause, it’s probably because MySQL doesn’t implement that specific part of the SQL-99 standard (feature request from 2006). This article introduces generator views for MySQL. Not as powerful as recursive CTEs, but good enough in most cases. But before going into the implementation details, I’ll show you a use case for a row generator.
Row generators are useful to fill gaps in results. Consider the following query:
SELECT COUNT(*), sale_date FROM sales WHERE sale_date > CURDATE() - INTERVAL 1 MONTH GROUP BY sale_date ORDER BY sale_date;
The result will not contain rows for days where no sales record exists at all. You can complete the result with a row generator.
Imagine a view,
GENERATOR, that returns numbered rows like this:
SELECT n FROM generator WHERE n < 31; +-----+ | n | +-----+ | 0 | | 1 | . . . . | 29 | | 30 | +-----+ 31 rows in set (0.00 sec)
This is the basic functionality of a row generator. Having that, it is quite simple to list all days since a month ago:
SELECT CURDATE() - INTERVAL n DAY dt FROM generator WHERE CURDATE() - INTERVAL n DAY > CURDATE() - INTERVAL 1 MONTH; +------------+ | dt | +------------+ | 2011-07-29 | | 2011-07-28 | . . . . . . . | 2011-07-01 | | 2011-06-30 | +------------+ 30 rows in set (0.00 sec)
Finally, we can use an outer join to combine the original result with the generated dates:
SELECT IFNULL(sales, 0) sales, dt sale_date FROM ( SELECT CURDATE() - INTERVAL n DAY dt FROM generator WHERE CURDATE() - INTERVAL n DAY > CURDATE() - INTERVAL 1 MONTH ) dates LEFT OUTER JOIN ( SELECT COUNT(*) sales, sale_date FROM sales WHERE sale_date > CURDATE() - INTERVAL 1 MONTH GROUP BY sale_date ) sales ON (sales.sale_date = dates.dt) ORDER BY dt;
The left side of the join has all the generated dates so that the outer join pads the right wide with
NULL, if there were no sale on that day. The
IFNULL turns the missing sales count into a zero.
So far, so good. But the problem is that MySQL has no row generator that produces an arbitrary number of rows as needed. Still there is a technique that is good enough in most cases.
It starts with something rather ridiculous:
CREATE OR REPLACE VIEW generator_16 AS SELECT 0 n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15;
It is a generator to produces 16 rows. You can use the same technique for larger generators as well. E.g., to produce one million rows, like shown at the end of this article. Just kidding. There is, of course, a better way:
CREATE OR REPLACE VIEW generator_256 AS SELECT ( hi.n * 16 + lo.n ) AS n FROM generator_16 lo , generator_16 hi;
This view builds the cross join (Cartesian product) of the previous view with itself. That means it pairs every row from the "lo" side with all rows from the "hi" side of the join. The result has 265 rows (16x16).
Considering the introductory example, it is not only important to generate an arbitrary number of rows, we need numbered rows to make use of it. For that purpose, I threat the two sides of the cross join like digits in a hexadecimal number. The "lo" side building the least significant, the "hi" side the most significant digit. Explained by SQL:
SELECT CONCAT(LPAD(hi.n,2,' '), ' * 16 + ' , LPAD(lo.n,2,' '), ' = ' , LPAD(hi.n*16+lo.n, 3,' ')) AS "HI LO SEQ" FROM generator_16 lo, generator_16 hi; +--------------------+ | HI LO SEQ | +--------------------+ | 0 * 16 + 0 = 0 | | 0 * 16 + 1 = 1 | | 0 * 16 + 2 = 2 | | 0 * 16 + 3 = 3 | | 0 * 16 + 4 = 4 | | 0 * 16 + 5 = 5 | | 0 * 16 + 6 = 6 | | 0 * 16 + 7 = 7 | | 0 * 16 + 8 = 8 | | 0 * 16 + 9 = 9 | | 0 * 16 + 10 = 10 | | 0 * 16 + 11 = 11 | | 0 * 16 + 12 = 12 | | 0 * 16 + 13 = 13 | | 0 * 16 + 14 = 14 | | 0 * 16 + 15 = 15 | | 1 * 16 + 0 = 16 | | 1 * 16 + 1 = 17 | . . . . . . . . . . . | 15 * 16 + 14 = 254 | | 15 * 16 + 15 = 255 | +--------------------+ 256 rows in set (0.00 sec)
I guess you know how to build a larger generator now?
There are, of course, some limitations:
The views are bounded
They cannot produce an arbitrary number of rows. But you can prepare views that generate large number of rows (see below).
The views will always build the full Cartesian product
Although you can limit the result with the where clause, the work to produce all rows is still done. It’s just filtering the unneeded. That means, it is very inefficient to use a large generator view if you need just a few rows.
Another aspect is that the result order is undefined—although the example above produces the numbers is ascending order. That’s just because the way MySQL joins the two views. If you build another meta view, e.g.,
GENERATOR_64K, you’ll note that the order is not ascending anymore—because of MySQLs Block Nested-Loop Join. But that doesn’t mean that the generator doesn’t work, each number is still generated exactly once. If you need a particular order, just use the
ORDER BY clause on the outermost select.
LIMIT to cut the view as needed because you might receive unexpected numbering. Please note that using
ORDER BY and
LIMIT in combination returns the correct result, but requires the intermediate result to be stored temporarily.
Use where to filter on the returned numbers. There is no need to store the intermediate result in that case.
Finally, here are some handy generator views up to 1 "mega-row". They use a minor improvement: bit-shift operations instead of arithmetics.
CREATE OR REPLACE VIEW generator_16 AS SELECT 0 n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11 UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15; CREATE OR REPLACE VIEW generator_256 AS SELECT ( ( hi.n << 4 ) | lo.n ) AS n FROM generator_16 lo, generator_16 hi; CREATE OR REPLACE VIEW generator_4k AS SELECT ( ( hi.n << 8 ) | lo.n ) AS n FROM generator_256 lo, generator_16 hi; CREATE OR REPLACE VIEW generator_64k AS SELECT ( ( hi.n << 8 ) | lo.n ) AS n FROM generator_256 lo, generator_256 hi; CREATE OR REPLACE VIEW generator_1m AS SELECT ( ( hi.n << 16 ) | lo.n ) AS n FROM generator_64k lo, generator_16 hi;
Even the last view, producing 220 rows, returns the result in a seconds on current hardware. However, the only use I have for the
GENERATOR_1M view is producing test data.
Always use the smallest possible generator for best performance.