What’s left of NoSQL?
This is my own and very loose translation of an article I wrote for the Austrian newspaper derStandard.at in October 2013. As this article was very well received and the SQL vs. NoSQL discussion is currently hot again, I though it might be a good time for a translation.
Back in 2013 The Register reported that Google sets its bets on SQL again. On the first sight this might look like a surprising move because it was of all things Google’s publications about MapReduce and BigTable that gave the NoSQL movement a big boost in the first place. On a second sight it turns out that there is a trend to use SQL or similar languages to access data from NoSQL systems—and that’s not even a new trend. However, it raises a question: What remains of NoSQL if we add SQL again? To answer this question, I need to start with a brief summary about the history of databases.
Top Dog SQL
It was undisputed for decades: store enterprise data in a relational database and use the programming language SQL to process the stored data.
The theoretical foundation for this approach, the relational model, was laid down as early as 1970. The first commercial representative of this new database type became available in 1979: the Oracle Database in release 2. Todays appearance of relational databases was mainly coined in the 1980s. The most important milestones were the formalization of the ACID criteria (Atomicity, Consistency, Isolation, Durability), which avoid accidental data corruption, and the standardization of the query language SQL—first by ANSI, one year later by ISO. From this time on SQL and ACID compliance were desirable goals for database vendors.
At first glance there were no major improvements during the next decades. The result: SQL and the relational model are sometimes called old or even outdated technology. At a closer look SQL databases evolved into mature products during that time. This process took that long because SQL is a declarative language and was way ahead of the technical possibilities of the 1980s. “Declarative language” means that SQL developers don’t need to specify the steps that lead to the desired result as with imperative programming, they just describe the desired result itself. The database finds the most efficient way to gather this result automatically—which is by no means trivial so that early implementations only mastered it for simple queries. Over these decades, however, the hardware got more powerful and the software more advanced so that modern databases deliver good results for complex queries too (see here if it doesn’t work for you).
It is especially important to note that the capabilities of SQL are not limited to storing and fetching data. In fact, SQL was designed to make refining and transforming data easy. Without any doubt, that was an important factor that made SQL and the relational model the first choice when it comes to databases.
And Then: NoSQL
Despite the omnipresence of SQL, a new trend emerged during the past few years: NoSQL. This term alone struck the nerve of many developers and caused a rapid spread that ultimately turned into a religious war. The opponents were SQL as symbol for outdated, slow, and expensive technology on the one side against an inhomogeneous collection of new, highly-scalable, free software that is united by nothing more than the umbrella brand “NoSQL.”
One possible explanation for the lost appreciation of SQL among developers is the increasing popularity of object-relational mapping tools (ORM) that generally tend to reduce SQL databases to pure storage media (“persistence layer”). The possibility to refine data using SQL is not encouraged but considerably hindered by these tools. The result of the excessive use is a step-by-step processing by the application. Under this circumstances SQL does indeed not deliver any additional value and it becomes understandable why so many developers sympathise with the term NoSQL.
But the problem is that the term NoSQL was not aimed against SQL in the first place. To make that clear the term was defined to mean “not only SQL” later on. Thus, NoSQL is about complementary alternatives. To be precise it is not even about alternatives to SQL but about alternatives to the relational model and ACID. In the meantime the CAP theorem revealed that the ACID criteria will inevitably reduce the availability of distributed databases. That means that traditional, ACID compliant, databases cannot benefit from the virtually unlimited resources available in cloud environments. This is what many NoSQL systems provide a solution for: instead of sticking to the very rigid ACID criteria to keep data 100% consistent all the time they accept temporary inconsistencies to increase the availability in a distributed environment. Simply put: in doubt they prefer to deliver wrong (old) data than no data. A more correct but less catchy term would therefore be NoACID.
Deploying such systems only makes sense for applications that don’t need strict data consistency. These are quite often applications in the social media field that can still fulfil their purpose with old data and even accept the loss of some updates in case of service interruption. These are also applications that could possibly need the unlimited scalability offered by cloud infrastructure. If a central database is sufficient, however, the CAP theorem does not imply a compelling reason to abandon the safety of ACID. However, using NoSQL systems can still make sense in domains where SQL doesn’t provide sufficient power to process the data. Interestingly, this problem is also very dominant in the social media field: although it is easy to express a social graph in a relational model it is rather cumbersome to analyse the edges using SQL.
Nevertheless: Back to SQL
Notwithstanding the above, SQL is still a very good tool to answer countless questions. This is also true for questions that could not be foreseen at the time of application design—a problem many NoSQL deployments face after the first few years. That’s probably also the cause for the huge demand for powerful and generic query languages like SQL.
Another aspect that strengthens the trend back to SQL goes to the heart of NoSQL: without ACID it is very difficult to write reliable software. Google said that very clearly in their paper about the F1 database: without ACID “we find developers spend a significant fraction of their time building extremely complex and error-prone mechanisms to cope with eventual consistency and handle data that may be out of date.” Apparently you only learn to appreciate ACID once you lost it.
What’s left of NoSQL?
If SQL and ACID become the flavor of the time again you may wonder what’s left of the NoSQL movement? Did it really waste half a decade like Jack Clark speculated at The Register? One thing we can say for sure is that SQL and ACID conformity are still desirable goals for database vendors. Further we know that there is a trade-off between ACID and scalability in cloud environments.
Of course, the race for the Holy Grail of the ideal balance between ACID and scalability has already begun. The NoSQL systems entered the field from the corner of unlimited scalability and started adding tools to control data consistency such as causal consistency. Obviously there a newcomers that use the term NewSQL to jump on the bandwagon by developing completely new SQL databases that bring ACID and SQL from the beginning but use NoSQL ideas internally to improve scalability.
And what about the established database vendors? They want to make us believe they are doing NoSQL too and release products like the “Oracle NoSQL Database” or “Windows Azure Table Storage Service.” The intention to create these products for the sole purpose to ride the hype is so striking that one must wonder why they treat neither NoSQL nor NewSQL as a serious threat to their business? When looking a the Oracle Database in it’s latest release 12c, we can even see the opposite trend. Although the version suffix “c” is ought to express its cloud capability it doesn’t change the fact that the killer feature of this release serves a completely different need: the easy and safe operation of multiple databases on a single server. That’s the exact opposite of what many NoSQL systems aim for: running a giant database on a myriad of cheap commodity servers. Virtualization is a way bigger trend than scale-out.
Is it even remotely possible that the established database vendors underwent such a fundamental misjudgment? Or is it more like that NoSQL only serves a small market niche? How many companies really need to cope with data at the scale of Google, Facebook or Twitter? Incidentally three companies that grew up on the open source database MySQL. One might believe the success of NoSQL is also based on the fact that it solves a problem that everybody would love to have. In all reality this problem is only relevant to a very small but prominent community, which managed to get a lot of attention. After all, also judging on the basis that the big database vendors don’t show a serious engagement, NoSQL is nothing more than a storm in a teacup.
If you like my way to explain things, you’ll love SQL Performance Explained.