When using offset
➌ to skip the previously fetched entries❶, you’ll get duplicates in case there were new rows inserted between fetching two pages➋. There are other anomalies possible too, this is just the most common one.
This is not even a database problem, it is the way frameworks implement pagination: they just say which page number to fetch or how many rows to skip. With this information alone, no database can do any better.
Also note that the offset problem comes in may different syntaxes:
The offset keyword
The 2-parameter limit [offset,] limit
(the 1 parameter limit
is fine)
Lower-bound filtering based on row numbering (e.g. row_number()
, rownum
, …).
The root problem all these methods have in common is that they just provide a number of rows to be dropped—no more context. In this article, I’m using offset
to refer to any of these methods.
In this guest post, Chris Saxon explains a very important topic using an analogy with chocolates: When does a database use an index and when is it better not using it. Although Chris explanation has the Oracle database in mind, the principles apply to other databases too.
A common question that comes up when people start tuning queries is “why doesn’t this query use the index I expect?”. There are a few myths surrounding when database optimizers will use an index. A common one I’ve heard is that an index will be used when accessing 5% or less of the rows in a table. This isn’t the case however - the basic decision on whether or not to use an index comes down to its cost.
So, I’ve been to PgCon 2014 in Ottawa to give a short version of my SQL performance training. However, I think I ended up learning more about SQLite than about PostgreSQL there. Here is how that happened and what I actually learned.
Richard Hipp, creator of SQLite was the keynote speaker at this years PgCon. In his keynote (slides, video) he has put the focus on three topics: how PostgreSQL influenced SQLite development (“SQLite was originally written from PostgreSQL 6.5 documentation” and the “What Would PostgreSQL Do?” (WWPD) way of finding out what the SQL standard tries to tell us). The second main topic was that SQLite should be seen as an application file format—an alternative to inventing own file formats or using ZIPped XMLs. The statement “SQLite is not a replacement for PostgreSQL. SQLite is a replacement for fopen()” nails that (slide 21). Finally, Richard put a lot of emphasis on that fact that SQLite takes care of your data (crash safe, ACID)—unlike many of the so-called NoSQL systems, which Richard refers to as “Postmodern Databases: absence of objective truth; Queries return opinions rather than facts”. In his keynote, Richard has also shown that SQLite is pretty relaxed when it comes to data types. As a matter of fact, SQLite accepts strings like “Hello” for INT
fields. Note that it still stores “Hello”—no data is lost. I think he mentioned that it is possible to enforce the types via CHECK
constraints.
This is my own and very loose translation of an article I wrote for the Austrian newspaper derStandard.at in October 2013. As this article was very well received and the SQL vs. NoSQL discussion is currently hot again, I though it might be a good time for a translation.
Back in 2013 The Register reported that Google sets its bets on SQL again. On the first sight this might look like a surprising move because it was of all things Google’s publications about MapReduce and BigTable that gave the NoSQL movement a big boost in the first place. On a second sight it turns out that there is a trend to use SQL or similar languages to access data from NoSQL systems—and that’s not even a new trend. However, it raises a question: What remains of NoSQL if we add SQL again? To answer this question, I need to start with a brief summary about the history of databases.
Dear MySQL,
Thank you for introducing me to SQL. It must have been 1998 when we first met I and fell in love with the simplicity of SQL immediately. Before that I’ve been using C structs all the time; I had to do my joins programmatically and also create and maintain my indexes manually. It was even hard to combine several search conditions via and
and or
. But then there was the shiny new world of SQL you were showing me…
In 2011, I’ve launched the “The 3-Minute Test: What do you know about SQL performance.” It consists of five questions that follow a simple pattern: each question shows a query/index pair and asks if it demonstrates proper indexing or not. Till today, this test has become one of the most popular features on Use The Index, Luke and has been completed more than 28,000 times.
Just in case you got curious, please be aware that this article is a spoiler. You might want to do the quiz yourself before continuing.
Although the quiz was created for educational purposes, I was wondering if I could get some interesting figures out of these 28,000 results. And I think I could. However, there are several things to keep in mind when looking at these figures. First, the quiz uses the surprise factor to catch attention. That means, three questions show cases that look fine, but aren’t. One question does it the other way around and shows an example that might look dangerous, but isn’t. There is only one question where the correct answer is in line with the first impression. Another effect that might affect the significance of the results is that there was no representative selection of participants. Everybody can take the quiz. You can even do it multiple times and will probably get a better result the second time. Just keep in mind that the quiz was never intended to be used for scientific research upon the indexing knowledge in the field. Nevertheless, I think that the size of the dataset is still good enough to get an impression.
Below I’ll show two different statistics for each question. First, the average rate at which this question was correctly answered. Second, how this figure varies for users of MySQL, Oracle, PostgreSQL and SQL Server databases. In other word, it says if e.g. MySQL users are more knowledgeable about indexing as PostgreSQL users. Spoiler: It’s the other way around. The only reason I’m in the lucky position to have this data is that the test sometimes uses vendor specific syntax. For example, what is LIMIT
in MySQL and PostgreSQL is TOP
in SQL Server. Therefore, the participants have to select a database at the beginning so that the questions are shown in the native syntax of that product.
As you might have noticed—at least if you have read SQL Performance Explained—I don’t think clustered indexes are as useful as most people believe. That is mainly because it is just too darn difficult to choose a good clustering key. As a matter of fact, choosing a good—the “right”—clustering key is almost impossible if there are more than one or two indexes on the table. The result is that most people just stick to the default—which is the primary key. Unfortunately, this is almost always the worst possible choice.
In this article I explain the beast named clustered index and all it’s downsides. Although this article uses SQL Server as demo database, the article is equally relevant for MySQL/MariaDB with InnoDB and the Oracle database when using index-organized tables.
A Japanese translation of this article is available here.
Yesterday evening I tweeted: “MongoDB seems to be as Bad for NoSQL as MySQL is for SQL.” Unfortunately, I tweeted without context. But I guess I couldn’t have given all the required context in a single tweet anyway, so I’m dedicating this post to it. I hope this answers some of the questions I’ve got in response to the tweet.
UPDATE: SQLFiddle integration has been removed in 2015 due to the low usage and high effort on my side.
Quite often I’m asked what I think about query hints. The answer is more lengthy and probably also more two-fold than most people expect it to be. However, to answer this question once and forever, I though I should write it down.
The most important fact about query hints is that not all query hints are born equally. I distinguish two major types:
Most query hints are restricting hints: they limit the optimizers’ freedom to choose an execution plan. “Hint” is an incredibly bad name for these things as they force the optimizer to do what it has been told—probably the reason MySQL uses the FORCE
keyword for those.
I do not like restricting hints, yet I use them sometimes to test different execution plans. It usually goes like this: when I believe a different execution plan could (should?) give better performance, I just hint it to see if it really gives better performance. Quite often it becomes slower and sometimes I even realize that the execution plan I though of does not work at all—at least not with the database I’m working at that moment.
Typical examples for restricting query hints are hints that force the database to use or not use a particular index (e.g., INDEX
and NO_INDEX
in the Oracle database, USE INDEX
and IGNORE INDEX
in MySQL, or INDEX
, FORCESEEK
and the like in SQL Server).
So, what’s wrong with them? Well, the two main problems are that they (1) restrict the optimizer and that they (2) often need volatile object names as parameters (e.g., index names). Example: if you use a hint to use index ABC
for a query, the hint becomes ineffective when somebody changes the name of the index to ABCD
. Further, if you restrict the optimizer you can no longer expect it to adjust the execution plan if you add another index that servers the query better. Of course there are ways around these problems. The Oracle database, for example, offers “index description” hints to avoid both issues: instead of specifying the index name, it accepts a description of the ideal index (column list) and it selects the index that matches this definition best.
Nevertheless, I strongly recommend against using restricting query hints in production. Instead you should find out why the optimizer does “the wrong thing” and fix the root cause. Restricting hints fix the symptom, not the cause. That being said, I know that there is sometimes no other reasonable choice.
The second major type of query hints are supporting hints: they support the optimizer by providing information it doesn’t have otherwise. Supporting hints are rare—I’m only aware of a few good examples and the most useful one has already become obsolete: it’s FAST
number_rows (SQL Server) and FIRST_ROWS(n)
(Oracle). They tell the optimizer that the application plans to fetch only that many rows of the result. Consequently, the optimizer can prefer using indexes and nested loop joins that would be inefficient when fetching the full result (see Chapter 7, Partial Results for more details). Although being kind-of obsolete, I’m still using these hints as the defining example for supporting hints because they provide information the optimizer cannot have otherwise. This particular example is important enough that it was worth defining new keywords in the ISO SQL:2008: FETCH FIRST ... ROWS ONLY
and OFFSET
. That’s why this hint is a very good, yet obsolete example for supporting query hints.
Another example for supporting hints is the (undocumented) CARDINALITY
hint of the Oracle database. It basically overwrites the row count estimate of sub-queries. This hint was often used if the combined selectivity of two predicates was way off the product of the selectivity of each individual predicate (see “Combined Selectivity Example”). But this hint is also outdated since Oracle database 11g introduced extended statistics to cope with issues like that. SQL Server’s filtered statistics serve the same purpose. If your database cannot reflect data correlation in it’s statistics, you’ll need to fall back to restricting hints.
The Oracle hint OPT_ESTIMATE
is somehow the successor of the CARDINALITY
hint for cases when the estimations are still off. Pythian wrote a nice article about OPT_ESTIMATE
.
It think supporting hints are not that bad: they are just a way to cope with known limitations of the optimizer. That’s probably why they tend to become obsolete when the optimizers evolve.
Here is another slide deck for my “Pagination Done the Right Way” talk that I’ve given at many occasions.
Please also have a look at this blog post by Gary Millsap about “The Ramp”. Do you see how using OFFSET implements this anti-pattern?
I think I’ve actually never shared the slides of my talk given in Paris at Dalibo’s PostgreSQL Session about Performance. So, here they are.
When clients tell me about their plans to invest in SSD storage for their database, they often look at me like a doctor telling a patient about his deadly disease. I didn’t discuss this with my clients until recently, when one client just asked me straight away: “As SQL tuning guy, are you afraid of SSD because it kills your job?” Here is what I told that client.
A few weeks ago I invited you to take part in a survey about your interest in SQL performance training for developers. In the meanwhile there is a schedule for German and English trainings available. In particular, I’d like to point out four online courses I’m giving during summer time. Hope to see you there.
Another opportunity for a short get-together are conferences I’ll attend and/or speak at. The next one is the Prague PostgreSQL Developers’ Day (p2d2) next Thursday. You can buy SQL Performance Explained there (CZK 700; in the breaks and after the conference). The next conference that I can already confirm is this years PostgreSQL Conference Europe in Dublin end of October. You might have noticed that I attended a lot of PostgreSQL conferences recently (Brussels in February, Paris in March). I do plan to attend other conferences too and I’ve just filed some proposals for talks at other conferences. I’ll let you know if they are accepted.
“ORMs are not entirely useless…” I just tweeted in response to a not exactly constructive message that we should fire developers who want to use ORMs. But than I continued in a not exactly constructive tone myself when I wrote “…the problem is that the ORM authors don’t know anything about database performance”.
Good news everybody. I’m in a good mood today ;) I’ll explain. First of all, the French edition of SQL Performance Explained is almost done and will ship at the end of the month (pre-order now ;). It has a different name (SQL : Au cœur des performances) but is otherwise the same. So, that triggered some kind of “project done” feeling.
Further, I’ve given a few good trainings recently and looking forward for some more in the next weeks. It’s just awesome when the audience comes up with great questions, but it’s even more awesome if they come up with great conclusions that proof their correct understanding. That triggers a “missing accomplished” feeling.
Dalibo, the company where Guillaume Lelarge works (he does the French translation of SQL Performance Explained) invited me to give a talk at their (mostly French) for-free conference “Better Performance with PostgreSQL” on 28th March in Paris. Seats are limited: if you’d like to go there, register now!
This is a short collection of the best tweets on @SQLPerfTips and/or @MarkusWinand for those who missed them and those who don’t use twitter at all.
Last weekend I’ve visited the Free and Open source Software Developers’ European Meeting (FOSDEM) in Brussels, Belgium. It is a huge and free conference covering countless open source topics such as Open/Libre Office, Microkernels, MySQL, NoSQL, PostgreSQL, OWASP, and much more.
My talk “Pagination done the PostgreSQL way” was accepted for the one-day PostgreSQL conference PGDay on 1st Feb 2013 just before FOSDEM in Brussels.
This is a short collection of the best tweets on @SQLPerfTips and/or @MarkusWinand for those who missed them and those who don’t use twitter at all.
Good news everybody—well, at least if you speak French. Use The Index, Luke and SQL Performance Explained will be translated. You can already start reading the first chapters here.
You might guess it, but I’m not making my living from this site nor are the book sales paying all the bills—although the books pays some bills, thank you! However, to pay the remaining bills I’m doing coaching, training and consulting. You can find all the details on my professional site http://winand.at/ but I’d like to highlight one service here: Instant Coaching.
After googling and trying many ways to solve a problem, have you ever got that feeling of resignation where you just want to ask somebody who should know the answer? Well, that’s Instant Coaching—at least if your problem is about performance and/or databases.
Joe Celko’s review of my book SQL Performance Explained was just published. The bottom line: “This book is definitively worth having in the company library.”
With a little delay of just three weeks, I’m writing something about this years PostgreSQL conference Europe.
Including the training day, it was four days full of database stuff—in other terms: an awesome time, if you love databases :) But let’s start from the beginning.
If you are not following my SQL Performance Tips (@SQLPerfTips) feed on twitter, you are missing a lot. However, two things you should probably know, even without using twitter:
I’m in no way affiliated with those guys, but they were so nice to offer a discount code for my readers (followers?). The conference takes place in Oxford, UK on 23rd Nov 2012. Have a look at http://allyourbaseconf.com/ for more details. Use the promo code “SQLPerf” to get £35 off the standard ticket price.
My @SQLPerfTips account has crossed the 10k followers mark recently. So I though I’ll celebrate this with another discount code (10kflwrs) that is good for -10% for orders of “SQL Performance Explained” when placed directly at http://sql-performance-explained.com/. This code expires on Fri, 12th Oct 20:00 Vienna local time (CEST). That’s 11am PDT, for example.
As mentioned before, I’ll be at this year PostgreSQL Conference in Prague. Still two weeks to go, but I just had a few minutes “free” time and prepared my personal schedule.
As usual, I plan to attend as many sessions as possible. As usual, I’ll skip some in favour of a nice chat and/or change my mind in the last moment.
“Use The Index, Luke!” celebrates its second anniversary today. Unfortunately, I was not able to prepare anything for this anniversary like I did last year when I presented the 3-minute test. The test is still very popular, by they way, it is taken about 20 times a day. However, all I have for this years anniversary is a promise.
Do you know http://sqlfiddle.com/? You should ;) It’s an online tool to test SQL queries with five different databases (SQL Server, MySQL, Oracle, PostgreSQL and SQLite). It is not a cloud database that you could use for building applications. SQLFiddle is a tool for collaborative online work on SQL problems. It was built to improve asking and solving SQL related questions on Q/A sites like Stackoverflow.
After publishing the English edition of “SQL Performance Explained” there was a huge demand for an ebook edition. After a few days answering these queries, I though it’s probably easier to offer an e-book than to explain why there isn’t one. Well, I must admit that I don’t read ebooks very much. My wife has an e-ink device, but is hardly using it (probably in lack of German content). I’ve uploaded the ePUB pre-release from 2011 (not available anymore) and must say it works, but isn’t nice. However, many people told me they would be fine with a PDF, because they are anyway using tablets or smartphones to read their ebooks. Fast forward a few weeks: since Monday I can offer a PDF download for just EUR 9.95 (English, German [update] or French, you choose).
Afterwards there was a huge demand—again—for a bundle discount when purchasing both, the PDF and the hard copy. That caught me by surprise. Not that people want a discount, but that people would like to buy both editions. So I asked, and listened, and learned something about my own business.
I’ve finally managed to put all the errors that were reported by readers so far online. I’m not exactly proud of this, but there are already more than 30 known errata for the German edition. There are still some German copies of “SQL Performance Explained” on stock, but it seems like I have to arrange a second printing soon. I will surly take up all these errors for the second printing. The English edition has just two errata at the moment, but I doubt it will stay that low. You are more than welcome to report any mistakes you find: I do follow them up :)
And now, something different: customs. I’ve received some inquires if I can indicate how much customs you might have to pay if you order the book from outside the EU. Well, I cannot tell you the exact amount, but help you Google it ;) The important information you need to find the rate in your country is the so-called HS number and the origin of the goods. I’ve to put this information on the CN22 customs declaration on the outside of the package, so authorities know what’s inside. The HS number for books is 490199 and this particular book is printed in Austria. That’s what I write on the customs declaration form. I’ve also been told that some countries might charge other fees besides customs (e.g., VAT). Sorry, don’t know any more details.
The English edition of “SQL Performance Explained” is finally available. It has its own website http://sql-performance-explained.com/ where you can find a PDF preview and order information. The cover price is EUR 29.95 or GBP 26.99.
If you order it from outside the EU, you must also consider that you might be required to pay import duties in the destination country.
During the last few weeks, it has become a little silent here at Use The Index, Luke! Today, I’d like to tell you the news that caused the silence and give a short outlook about the future of Use The Index, Luke!
The most important news is that I became father. It’s very important to me, at least. My son, Marcel, was born on 25th March 2012. So, we already had two month to settle down. Mommy is quite good at the night shift—I can sleep almost normally and have enough time and energy to work during daytime.
Use The Index, Luke! was one today.
There was quite some progress during this year. Besides the steadily growing content, I am particularly happy that the appendices about execution plans and the example schema cover MySQL, PostgreSQL and SQL Server in addition to Oracle.
MariaDB 10.2 introduced support for common table expressions.
MySQL 8.0 introduced support for common table expressions.
A row generator is a method to generate numbered rows on demand. It is possible to build a row generator with pure standard SQL—using recursive common table expressions (CTE), the with
clause. If you never heard of the with
clause, it’s probably because MySQL doesn’t implement that specific part of the SQL-99 standard (feature request from 2006). This article introduces generator views for MySQL. Not as powerful as recursive CTEs, but good enough in most cases. But before going into the implementation details, I’ll show you a use case for a row generator.
There was a discussion about bind parameters and execution plan caching in a recent reddit thread about my “Avoid Smart Logic for Conditional WHERE
clauses” page. My article mentions that bind variables make the actual values “invisible” for the optimizer resulting in a generic plan that can be cached and re-used later on. Jeff Davis emphasized that using bind parameters and the caching of execution plans are two distinct concepts and provided a PostgreSQL/Ruby example for that.
I have one wish for each database. Some of them are big wishes, so I thought I’ll better send them early. Only six month until Christmas, you know ;)
'' IS NULL?
Are you serious? I know, it has always been that way, but does that mean it will stay so forever?
Could you please enable READ_COMMITTED_SNAPSHOT
per default? Or, at least, make it a mandatory question during database creation? That could prevent so many locking problems.
Could you please implement index-only scans (aka “covering indexes”)? It’s a great performance feature, which I really miss in PostgreSQL.
UPDATE: This one became true with PostgreSQL 9.2.
Could you please implement the hash join algorithm? So many applications suffer from poor join performance, just because MySQL doesn’t have hash joins.
From today’s perspective, it looks like my PostgreSQL wish is the only one that might come true. But I’ll keep hoping for the others as well.
The essence of SQL tuning in 200 pages
Buy now!
(paperback and/or PDF)
Paperback also available at Amazon.com.
Markus offers SQL training and consulting for developers working at companies of any size.
Learn more »