The two top performance problems caused by ORM tools
“ORMs are not entirely useless…” I just tweeted in response to a not exactly constructive message that we should fire developers who want to use ORMs. But than I continued in a not exactly constructive tone myself when I wrote "…the problem is that the ORM authors don’t know anything about database performance”.
Well, I don’t think they “don’t know anything” but I wonder why they don’t provide decent solutions (including docs) for the two most striking performance problems caused by ORMs? Here they are:
- The infamous N+1 selects problem
This problem is actually well-known and I believe there are working solutions in all recent ORMs. The problem that remains is about documentation and culture: although there are solutions, many developers are not aware of them or still live in the “joins are slow—let’s avoid them” universe. The N+1 selects problem seems to be on the decline but I still think ORMs’ documentation should put more emphasis on joins.
For me, it looks very similar to the SQL injection topic: each and every database access layer provides bind parameters but the documentation and books just show examples using literal values. The result: SQL injection was the most dangerous weakness in the CWE/SANS Top 25 list. Not because the tools don’t provide proper ways to close that hole, but because the examples don’t use them consistently.
- The hardly-known Index-Only Scan
Although the Index-Only Scan is one of the most powerful SQL tuning techniques, it seems to be hardly known by developers (related SO question I was involved in recently). However, I’ll try to make the story short.
Whenever a database uses an index to find the requested data quickly, it can also use the information in the index to deliver the queried data itself—if the queried data is available in the index. Example:
CREATE TABLE demo ( last_name VARCHAR(255) , first_name VARCHAR(255) -- more columns and constraints ); CREATE INDEX ios_demo ON demo (last_name, first_name); SELECT last_name, first_name FROM demo WHERE last_name = ?;
If the database uses the
IOS_DEMOindex to find the rows in question, it can directly use the first name that is stored along with the last name in the index and deliver the queries’ result right away without accessing the actual table. That saves a lot of IO—especially when you are selecting more than a few rows. This technique is even more useful (important) for databases that use clustered indexes like SQL Server or MySQL+InnoDB because they have an extra level of indirection between the index and the table data.
Did you see the crucial prerequisite to make an Index-Only Scan happen? Or asking the other way around: what’s a good way to make sure you’ll never get an Index-Only Scan? Yes, selecting all columns is the most easy yet effective Index-Only-Scan-Preventer. And now guess who is selecting all the columns all the time on behalf of you? Your good old friend the ORM-tool.
This is where the tool support is really getting sparse. Selecting partial objects is hardly supported by ORMs. If it is supported then often in a very inconvenient way that doesn’t give runtime control over the columns you’d like to select. And for gods sake, let’s forget about the documentation for a moment. To say it straight: using Index-Only Scans is a pain in the a** with most ORMs.
Besides Index-Only Scans, not selecting everything can also improve sorting, grouping and join performance because the database can save memory that way.
What we would actually need to get decent database performance is a way to declare which information we are going to need in the upcoming unit of work. Malicious tongues might now say “that’s SQL!” And it’s true (in my humble opinion). However, I do acknowledge that ORMs reduce boilerplate code. Luckily they often offer an easier language than SQL: their whatever-QL (like HQL). Although the syntactic difference between SQL and whatever-QL is often small, the semantic difference is huge because they don’t work on tables and columns but on objects and classes. That avoids a lot of typing and feels more natural to developers from the object world. Of course, the whatever-QL needs to support everything we need—also partial objects like in this Doctrine example.
After all, I think ORM documentation should be more like this: first introduce whatever-QL as simplified SQL dialect that is the default way to query the database. Those methods that are currently mentioned first (e.g.,
.byId) should be explained as a shortcut if you really need only one row from a single table.