When clients tell me about their plans to invest in SSD storage for their database, they often look at me like a doctor telling a patient about his deadly disease. I didn’t discuss this with my clients until recently, when one client just asked me straight away: “As SQL tuning guy, are you afraid of SSD because it kills your job?” Here is what I told that client.
Generally, people seem to believe that SSD are just much faster than HDD. As a matter of fact, this is only partially true because—as I also mentioned in Chapter 3—performance has two dimensions: response time and throughput. Although SSDs tend to deliver more performance on both axes, it has to be considered that the throughput delivered by SSD is “only” about five times as high as that of HDDs. That’s because HDDs are not bad at sequential read/write operations anyway.
Sure enough a five times faster storage makes many performance problems go away…for a while…until you have five times more data. For a decently growing startup it might just take a few months until you have the same problem again. However, this is not the crucial point here. The crucial point is that SSDs essentially fix the one performance issue where HDDs are really bad at: the response time. Due to the lack of moving parts, the response time of SSDs is about fifty times faster as that of HDDs. Well, that really helps solving problems for a while.
However, there is a catch—maybe even a Catch-22: If you want to get the factor 50 speed-up of SSDs, you’d better avoid reading large chunks of sequential data, because that’s where you can only gain a factor five improvement. To put that into database context: if you are doing many full table scans, you won’t get the full potential of SSD. On the other hand, index lookups have a tendency to cause many random IO operations and can thus benefit from the fast response time of SSDs. The fun part is that properly indexed databases get better benefits from SSD than poorly indexed ones. But guess who is most desperately betting on SSD to solve their performance problems? People having proper indexes or those who don’t have them?
The story goes on: which database operation do you think causes most random IO operations? Of course it’s our old friend the join—it is the sole purpose of joins to gather many little data fragments from different places and combine them into the result we want. Joins can also greatly benefit from SSDs. SSDs actually voids one of arguments often brought up by NoSQL folks against relational databases: with SSD it doesn’t matter that much if you fetch data from one place or from many places.
To conclude what I said to my client: No, as an indexing-focused SQL performance guy, I’m absolutely not afraid of SSD.
If you like my way to explain things, you’ll love SQL Performance Explained.