Blog


Pagination Done the Right Way

Here is another slide deck for my "Pagination Done the Right Way" talk that I've given at many occassions.

Please also have a look at this blog post by Gary Millsap about “The Ramp”. Do you see how using OFFSET implements this anti-pattern?

Indexes: The neglected performance all-rounder

I think I've actually never shared the slides of my talk given in Paris at Dalibo's PostgreSQL Session about Performance. So, here they are.

Afraid of SSD?

When clients tell me about their plans to invest in SSD storage for their database, they often look at me like a doctor telling a patient about his deadly disease. I didn’t discuss this with my clients until recently, when one client just asked me straight away: “As SQL tuning guy, are you afraid of SSD because it kills your job?” Here is what I told that client.

Generally, people seem to believe that SSD are just much faster than HDD. As a matter of fact, this is only partially true because—as I also mentioned in Chapter 3—performance has two dimensions: response time and throughput. Although SSDs tend to deliver more performance on both axes, it has to be considered that the throughput delivered by SSD is “only” about five times as high as that of HDDs. That’s because HDDs are not bad at sequential read/write operations anyway.

Sure enough a five times faster storage makes many performance problems go away…for a while…until you have five times more data. For a decently growing startup it might just take a few months until you have the same problem again. However, this is not the crucial point here. The crucial point is that SSDs essentially fix the one performance issue where HDDs are really bad at: the response time. Due to the lack of moving parts, the response time of SSDs is about fifty times faster as that of HDDs. Well, that really helps solving problems for a while.

However, there is a catch—maybe even a Catch-22: If you want to get the factor 50 speed-up of SSDs, you’d better avoid reading large chunks of sequential data, because that’s where you can only gain a factor five improvement. To put that into database context: if you are doing many full table scans, you won’t get the full potential of SSD. On the other hand, index lookups have a tendency to cause many random IO operations and can thus benefit from the fast response time of SSDs. The fun part is that properly indexed databases get better benefits from SSD than poorly indexed ones. But guess who is most desperately betting on SSD to solve their performance problems? People having proper indexes or those who don’t have them?

The story goes on: which database operation do you think causes most random IO operations? Of course it’s our old friend the join—it is the sole purpose of joins to gather many little data fragments from different places and combine them into the result we want. Joins can also greatly benefit from SSDs. SSDs actually voids one of arguments often brought up by NoSQL folks against relational databases: with SSD it doesn’t matter that much if you fetch data from one place or from many places.

To conclude what I said to my client: No, as an indexing-focused SQL performance guy, I’m absolutely not afraid of SSD.

If you like my way to explain things, you’ll love SQL Performance Explained.

Training and Conference Dates

A few weeks ago I invited you to take part in a survey about your interest in SQL performance training for developers. In the meanwhile there is a schedule for German and English trainings available. In particular, I’d like to point out four online courses I’m giving during summer time. Hope to see you there.

Another opportunity for a short get-together are conferences I’ll attend and/or speak at. The next one is the Prague PostgreSQL Developers’ Day (p2d2) next Thursday. You can buy SQL Performance Explained there (CZK 700; in the breaks and after the conference). The next conference that I can already confirm is this years PostgreSQL Conference Europe in Dublin end of October. You might have noticed that I attended a lot of PostgreSQL conferences recently (Brussels in February, Paris in March). I do plan to attend other conferences too and I’ve just filed some proposals for talks at other conferences. I’ll let you know if they are accepted.

One more thing: in case you are involved in the organization of developer and/or database centric event—no matter how small—you might want to have a look at the small sponsoring I can offer.

The two top performance problems caused by ORM tools

ORMs are not entirely useless…” I just tweeted in response to a not exactly constructive message that we should fire developers who want to use ORMs. But than I continued in a not exactly constructive tone myself when I wrote "…the problem is that the ORM authors don’t know anything about database performance”.

Well, I don’t think they “don’t know anything” but I wonder why they don’t provide decent solutions (including docs) for the two most striking performance problems caused by ORMs? Here they are:

The infamous N+1 selects problem

This problem is actually well-known and I believe there are working solutions in all recent ORMs. The problem that remains is about documentation and culture: although there are solutions, many developers are not aware of them or still live in the “joins are slow—let’s avoid them” universe. The N+1 selects problem seems to be on the decline but I still think ORMs’ documentation should put more emphasis on joins.

For me, it looks very similar to the SQL injection topic: each and every database access layer provides bind parameters but the documentation and books just show examples using literal values. The result: SQL injection was the most dangerous weakness in the CWE/SANS Top 25 list. Not because the tools don’t provide proper ways to close that hole, but because the examples don’t use them consistently.

The hardly-known Index-Only Scan

Although the Index-Only Scan is one of the most powerful SQL tuning techniques, it seems to be hardly known by developers (related SO question I was involved in recently). However, I’ll try to make the story short.

Whenever a database uses an index to find the requested data quickly, it can also use the information in the index to deliver the queried data itself—if the queried data is available in the index. Example:

CREATE TABLE demo
     ( last_name  VARCHAR(255)
     , first_name VARCHAR(255)
     -- more columns and constraints
     );

CREATE INDEX ios_demo
    ON demo (last_name, first_name);

SELECT last_name, first_name
  FROM demo
 WHERE last_name = ?;

If the database uses the IOS_DEMO index to find the rows in question, it can directly use the first name that is stored along with the last name in the index and deliver the queries’ result right away without accessing the actual table. That saves a lot of IO—especially when you are selecting more than a few rows. This technique is even more useful (important) for databases that use clustered indexes like SQL Server or MySQL+InnoDB because they have an extra level of indirection between the index and the table data.

Did you see the crucial prerequisite to make an Index-Only Scan happen? Or asking the other way around: what’s a good way to make sure you’ll never get an Index-Only Scan? Yes, selecting all columns is the most easy yet effective Index-Only-Scan-Preventer. And now guess who is selecting all the columns all the time on behalf of you? Your good old friend the ORM-tool.

This is where the tool support is really getting sparse. Selecting partial objects is hardly supported by ORMs. If it is supported then often in a very inconvenient way that doesn’t give runtime control over the columns you’d like to select. And for gods sake, let’s forget about the documentation for a moment. To say it straight: using Index-Only Scans is a pain in the a** with most ORMs.

Besides Index-Only Scans, not selecting everything can also improve sorting, grouping and join performance because the database can save memory that way.

What we would actually need to get decent database performance is a way to declare which information we are going to need in the upcoming unit of work. Malicious tongues might now say “that’s SQL!” And it’s true (in my humble opinion). However, I do acknowledge that ORMs reduce boilerplate code. Luckily they often offer an easier language than SQL: their whatever-QL (like HQL). Although the syntactic difference between SQL and whatever-QL is often small, the semantic difference is huge because they don’t work on tables and columns but on objects and classes. That avoids a lot of typing and feels more natural to developers from the object world. Of course, the whatever-QL needs to support everything we need—also partial objects like in this Doctrine example.

After all, I think ORM documentation should be more like this: first introduce whatever-QL as simplified SQL dialect that is the default way to query the database. Those methods that are currently mentioned first (e.g., .find or .byId) should be explained as a shortcut if you really need only one row from a single table.

If you like my way to explain things, you’ll love SQL Performance Explained.

I need your help!

Hi!

Would you please spare me a few minutes and help me a little bit?

As you might know—or maybe not—I’m making my living as an independent trainer and consultant. Up till now I’ve only delivered on-site training at the clients’ site, but I though it makes sense to offer open trainings as well so that singe participants can also join. For that I’d need to know how many people would like to join such a training, where they are physically located, and which database they are using. So, I’ve set up a short survey:

http://winand.at/services/sql-performance-training/survey

It’s just one form and won’t take much time. Naturally, you are under no obligation if you fill out the form, and as a way of saying thank you for your time, I’ll be drawing three participants to receive a DON’T PANIC towel—the perfect preparation for the upcoming Towel Day on 25th May. So take a minute to complete the survey now because the deadline is 26th of April!

Thanks,

-markus

Party time

Good news everybody. I’m in a good mood today ;) I’ll explain. First of all, the French edition of SQL Performance Explained is almost done and will ship at the end of the month (pre-order now ;). It has a different name (SQL : Au cœur des performances) but is otherwise the same. So, that triggered some kind of "project done" feeling.

Further, I’ve given a few good trainings recently and looking forward for some more in the next weeks. It’s just awesome when the audience comes up with great questions, but it’s even more awesome if they come up with great conclusions that proof their correct understanding. That triggers a "missing accomplished" feeling.

So, as the pre-sale of the French edition is running on a 10% discount, I though why not giving a general 10% discount to celebrate the French edition? And here it is: ’aucoeur’. It’s only valid for order placed directly at http://sql-performance-explained.com/ (not valid on Amazon) and expires on March 28 (the release date of the French edition).

Further, I’ll repeat the ’retweet to win’ campaign every week during March. It goes like this: I’ll tweet something like ’Retweet this for a chance to win’. After a while, I’ll select one of the retweeters as winner. You can’t win twice (sorry @d_gustafsson). So, watch out, the first giveaway tweet comes soon.

Better Performance with PostgreSQL

Dalibo, the company where Guillaume Lelarge works (he does the French translation of SQL Performance Explained) invited me to give a talk at their (mostly French) for-free conference “Better Performance with PostgreSQL” on 28th March in Paris. Seats are limited: if you’d like to go there, register now!

And if you don’t speak French, there are other good news for you. SQL Performance Explained is currently on sale at Amazon.co.uk for £25.00 (you save £1.99, -7%). Please note that this is the free-delivery threshold for these countries: Belgium, Denmark, Luxembourg, Netherlands, Andorra, Finland, Gibraltar, Greece, Iceland, Ireland, Italy, Liechtenstein, Norway, Portugal, San Marino, Spain, Sweden, Vatican city and Poland. They will also ship for free to UK of course.

FOSDEM Impressions

Last weekend I’ve visited the Free and Open source Software Developers’ European Meeting (FOSDEM) in Brussels, Belgium. It is a huge and free conference covering countless open source topics such as Open/Libre Office, Microkernels, MySQL, NoSQL, PostgreSQL, OWASP, and much more.

On the day before FOSDEM, Friday, I’ve been at the PgDay giving my talk “Pagination Done the PostgreSQL Way” [slides]. I think it was well received but I’m still curious to get the feedback from the feedback forms. So, if you have been there, please fill in the feedback form!

Here you find a short list of notable talks. I’ve actually attended more, I’m just mentioning the most impressive ones.

The GNU/Hurd architecture, nifty features, and latest news

Hurd is still being released in two years. This has been funny 15 years ago, but is quite sad today. However, they aim to get a fully supported Debian distro based on Hurd by that time. Good luck. It seems that current virtualization developments (e.g. Linux containers) are a very good fit for the Hurd architecture, effectively not needing any changes to support containers natively.

Lightning talk: Spoiling and Counter-spoiling

An attempt to make a coding contest that honours maintainable code. The problem: how to measure it? The solution: make three contests: (1) one for writing maintainable code; (2) one for incorporating difficult bugs into that code; (3) and a last one to fix those bugs again. Add some statistics to that and you know who’s code had the best maintainability by measuring how long it took to find and fix the bugs. Funny idea, but not yet executed.

Practical Security for developers, using OWASP ZAP

Promising tool for security checks (brute force). Video available at the homepage.

Lightning talk: The C2 programming language

Another successor of C. It’s a minimalistic approach to just change what really hurts in C. That are a few syntax issues and the preprocessor (#include). At a very early stage. See c2lang.org.

How to Share a Trademark

About legal issues (e.g., Trademark vs. Registered Trademark). Some example cases from the open source scene.

Lightning talk: BIND 10: DNS by Cooperating Processes

Have I ever mentioned that I used to be the DNS admin of an Austrian ISP? It’s more than 10 years ago, but I still hate BIND ;) Privately I’m only using djbdns since then. One of the fundamental design principles of djbdns is that there is a separation of concerns (e.g., serving a zone authoritatively is different from acting as caching resolver). Guess what’s gonna be introduced in BIND 10 ;)

PostGIS 2.0 and beyond

Really interesting talk where PostGIS is coming from (1.5 releases) and how 2.0 differs from it. One of the nice things about PostGIS presentations is that they always have nice visualizations (by definition). There was also a PostGIS 3D talk at PgDay.

When and How to Take Advantage of New Optimizer Features in MySQL 5.6

I’ve been to another MySQL talks as well and I must say they don’t feel like FOSS talks. It’s just Oracle presenting. However, the topics presented are no surprise (e.g. Index Condition Pushdown, a.k.a. Index-Filterpredicates). The scary thing about this talk was that the examples almost always needed hits to work. Sure you could say that these are just made up examples, but I think it just a proof of the optimizers weakness if it needs all the time hints to do the “right thing”.

A real-time architecture using Hadoop & Storm

Presenting the “classical” batch/online mixture to handle BigData: Running Hadoop/MapReduce in Batches and processing the new data online (in memory). To get the full picture, you must query both of them.

Top Tweets January 2013

This is a short collection of the best tweets on @SQLPerfTips and/or @MarkusWinand for those who missed them and those who don’t use twitter at all.

As I’m late posting the January tweets, I also include an awesome picture taken at FOSDEM (February, technically).

About the Author

As an author, trainer, and coach Markus Winand specializes in helping developers cope with SQL performance issues. He also published the book SQL Performance Explained and tweets his best performance tips via @SQLPerfTips.http://winand.at/

Recent Questions at Ask.Use-The-Index-Luke.com

2
votes
1
answer
20
views

Table Names in Plural Form

18 hours ago simas 36
table
0
votes
1
answer
163
views

We want to buy the book but I can't

Jul 18 at 21:36 Markus Winand ♦♦ 596
book
0
votes
2
answers
165
views