Translations
A Japanese translation of this article is available here.
Yesterday evening I tweeted: “MongoDB seems to be as Bad for NoSQL as MySQL is for SQL.” Unfortunately, I tweeted without context. But I guess I couldn’t have given all the required context in a single tweet anyway, so I’m dedicating this post to it. I hope this answers some of the questions I’ve got in response to the tweet.
First of all, I think everybody should know that I’m not a NoSQL fanboy, yet I’m open to the polyglot persistence idea. This distinction doesn’t seem to make sense if you read NoSQL as “not only SQL” (as you are supposed to do). However, I believe there are NoSQL systems out there that greatly benefit from the idea that SQL is bad and not using SQL is good. On other words, they offer “not using SQL” as their main advantage. MongoDB seems to be one of them. Just my perception.
But if I don’t like NoSQL, then I should like MySQL? Not exactly. In my eyes, MySQL has done great harm to SQL because many of the problems people associate with SQL are in fact just MySQL problems. One of the more important examples is that MySQL is rather poor at joining because is only supports nested loops joins. Most other SQL database implement the hash join and sort/merge join algorithms too—both deliver better performance for non-tiny data sets. Considering the wide adoption of MySQL (“The most popular open source database”) and the observation that many people move away from SQL because “joins are slow,” it isn’t far-fetched to say that an implementation limitation of MySQL pushes people towards NoSQL.
Now let’s look at MongoDB. I think the direct competition between MongoDB and MySQL became most obvious in the epic video “MongoDB is Web Scale.” In the meanwhile, MongoDB even claims to be “the leading NoSQL database” — does that sound like “the most popular open source database”? Nevertheless, MongoDB has disappointed many people because it couldn’t live up to it’s promise of “web scale” (example: global write lock up to release 2.2).
The next piece in the puzzle that eventually caused me to tweet was a funny tweet by Gwen (Chen) Shapira (she’s an Oracle DB consultant):
#mongoDB : the big data platform that is challenging to scale over 100GB. http://blog.mongodb.org/post/62899600960/scaling-advice-from-mongohq
Note that the link was broken for a while (the post originally appeared on Sep 30, then disappeared, but is online since Oct 2 again at a different URL). The article is about handling MongoDB if it grows above 100GB. It gives me the impression that scaling MongoDB to that size is a serious issue. Even though there is no exact definition of “web scale” I guess most people would assume that it should be easy to scale MongoDB to 100GB. 100GB is not big data nowadays. 100GBs can be easily managed with most SQL DBs (joining in MySQL could be a problem). It was really funny to see this post on the MongoDB blog. Chen’s tweet nailed it.
At this point, I was once more thinking about the “misspent half-decade” mentioned by Jack Clark in his article “Google goes back to the future with SQL F1 database.” But as mentioned before, I like the idea of polyglot persistence. I’m not saying NoSQL is bullshit—not just because a single implementation fails to deliver. That would be like saying SQL is bullshit because MySQL is bad at joining. On the contrary, it reminded how Alex Popescu lost his temper in his post “The premature return to SQL” last Friday. His response to the “misspent half-decade” was:
Just take a second a think what we got during this misspent half-decade: Redis, Cassandra, Riak, a multi-parallel fully programmatic way to process data, Cascading, Pig, Cypher, ReQL and many more tools, languages, and APIs for processing data.
Well, I don’t know all of these but I do realize that some of them are interesting tools to have in the tool box. Further, I’m following Alex Popescu long enough to know that he is rather reflective on NoSQL—the title of his post being an exception. That’s why I came back to his post to see if he mentioned “the leading NoSQL database“ in his list. He didn’t. I don’t think it’s a coincidence.
At this point it was inevitable to see MongoDB as a popular, yet poor representative of its species—just like MySQL is.
Links
MongoDB mocked after posting “100GB Scaling Checklist”
Also covering how the re-posted checklist was changed as compared to the original post :)
Baron Schwartz “MySQL isn’t limited to nested-loop joins”
Note the comments too. There is doubt if the mentioned features can compete with hash joins.