The nested loops join is the most fundamental join algorithm. It works like using two nested queries: the outer or driving query to fetch the results from one table and a second query for each row from the driving query to fetch the corresponding data from the other table.
You can actually use “nested selects” to implement the nested loops algorithm on your own. Nevertheless that is a troublesome approach because network latencies occur on top of disk latencies—making the overall response time even worse. “Nested selects” are still very common because it is easy to implement them without being aware of it. Object-relational mapping (ORM) tools are particularly “helpful” in this respect…to the extent that the so-called N+1 selects problem has gained a sad notoriety in the field.
On my Own Behalf
I offer training, tuning and consulting. Buying my book “SQL Performance Explained” (from €9.95) also supports my work on this website.
The following examples show these “accidental nested select” joins
produced with different ORM tools. The examples search for employees whose
last name starts with 'WIN'
and fetches all
SALES
for these employees.
- Java
The JPA example uses the CriteriaBuilder interface.
CriteriaBuilder queryBuilder = em.getCriteriaBuilder(); CriteriaQuery<Employees> query = queryBuilder.createQuery(Employees.class); Root<Employees> r = query.from(Employees.class); query.where( queryBuilder.like( queryBuilder.upper(r.get(Employees_.lastName)), "WIN%" ) ); List<Employees> emp = em.createQuery(query).getResultList(); for (Employees e: emp) { // process Employee for (Sales s: e.getSales()) { // process sale for Employee } }
Hibernate JPA 3.6.0 generates N+1
select
queries:select employees0_.subsidiary_id as subsidiary1_0_ -- MORE COLUMNS from employees employees0_ where upper(employees0_.last_name) like ?
select sales0_.subsidiary_id as subsidiary4_0_1_ -- MORE COLUMNS from sales sales0_ where sales0_.subsidiary_id=? and sales0_.employee_id=?
select sales0_.subsidiary_id as subsidiary4_0_1_ -- MORE COLUMNS from sales sales0_ where sales0_.subsidiary_id=? and sales0_.employee_id=?
- Perl
The following sample demonstrates Perl’s DBIx::Class framework:
my @employees = $schema->resultset('Employees') ->search({'UPPER(last_name)' => {-like=>'WIN%'}}); foreach my $employee (@employees) { # process Employee foreach my $sale ($employee->sales) { # process Sale for Employee } }
DBIx::Class 0.08192 generates N+1
select
queries:SELECT me.employee_id, me.subsidiary_id , me.last_name, me.first_name, me.date_of_birth FROM employees me WHERE ( UPPER(last_name) LIKE ? )
SELECT me.sale_id, me.employee_id, me.subsidiary_id , me.sale_date, me.eur_value FROM sales me WHERE ( ( me.employee_id = ? AND me.subsidiary_id = ? ) )
SELECT me.sale_id, me.employee_id, me.subsidiary_id , me.sale_date, me.eur_value FROM sales me WHERE ( ( me.employee_id = ? AND me.subsidiary_id = ? ) )
- PHP
The Doctrine sample uses the query builder interface:
$qb = $em->createQueryBuilder(); $qb->select('e') ->from('Employees', 'e') ->where("upper(e.last_name) like :last_name") ->setParameter('last_name', 'WIN%'); $r = $qb->getQuery()->getResult(); foreach ($r as $row) { // process Employee foreach ($row->getSales() as $sale) { // process Sale for Employee } }
Doctrine 2.0.5 generates N+1
select
queries:SELECT e0_.employee_id AS employee_id0 -- MORE COLUMNS FROM employees e0_ WHERE UPPER(e0_.last_name) LIKE ?
SELECT t0.sale_id AS SALE_ID1 -- MORE COLUMNS FROM sales t0 WHERE t0.subsidiary_id = ? AND t0.employee_id = ?
SELECT t0.sale_id AS SALE_ID1 -- MORE COLUMNS FROM sales t0 WHERE t0.subsidiary_id = ? AND t0.employee_id = ?
SALES
table with nested selects. This effect is known as the
“N+1 selects problem” or shorter the “N+1 problem” because it executes N+1 selects in total if the driving
query returns N rows.Enabling SQL Logging
Enable SQL logging during development and review the generated SQL statements.
- DBIx::Class
export DBIC_TRACE=1
in your shell.- Doctrine
Only on source code level—don’t forget to disable this for production. Consider building your own configurable logger.
$logger = new \Doctrine\DBAL\Logging\EchoSqlLogger; $config->setSQLLogger($logger);
- Hibernate (native)
<property name="show_sql">true</property>
inApp.config
orhibernate.cfg.xml
- JPA
In
persistence.xml
but depending on the JPA provider—e.g., for eclipselink, Hibernate and OpenJPA:<property name="eclipselink.logging.level" value="FINE"/> <property name="hibernate.show_sql" value="TRUE"/> <property name="openjpa.Log" value="SQL=TRACE"/>
Most ORMs offer a programmatic way to enable SQL logging as well. That involves the risk of accidentally deploying the setting in production.
Even though the “nested selects” approach is an anti-pattern, it
still explains the nested loops join pretty well. The
database executes the join exactly as the ORM tools above. Indexing for a
nested loops join is therefore like indexing for the above shown select
statements. That is a function-based index on the table
EMPLOYEES
and a concatenated
index for the join predicates on the SALES
table:
CREATE INDEX emp_up_name ON employees (UPPER(last_name))
CREATE INDEX sales_emp ON sales (subsidiary_id, employee_id)
An SQL join is still more efficient than the nested selects approach—even though it performs the same index lookups—because it avoids a lot of network communication. It is even faster if the total amount of transferred data is bigger because of the duplication of employee attributes for each sale. That is because of the two dimensions of performance: response time and throughput; in computer networks we call them latency and bandwidth. Bandwidth has only a minor impact on the response time but latencies have a huge impact. That means that the number of database round trips is more important for the response time than the amount of data transferred.
Tip
Execute joins in the database.
Most ORM tools offer some way to create SQL joins. The
so-called eager fetching mode is probably the most
important one. It is typically configured at the property level in the
entity mappings—e.g., for the employees
property in the
Sales
class. The ORM tool will then always join the
EMPLOYEES
table when accessing the SALES
table.
Configuring eager fetching in the entity mappings only makes sense if you
always need the employee details along with the sales data.
Eager fetching is counterproductive if you do not need the child
records every time you access the parent object. For a telephone directory
application, it does not make sense to load the SALES
records
when showing employee details. You might need the related sales data in
other cases—but not always. A static configuration is no solution.
For optimal performance, you need to gain full control over joins. The following examples show how to get the greatest flexibility by controlling the join behavior at runtime.
- Java
The JPA
CriteriaBuilder
interface provides theRoot<>.fetch()
method for controlling joins. It allows you to specify when and how to join referred objects to the main query. In this example we use a left join to retrieve all employees even if some of them do not have sales.Warning
JPA and Hibernate return the employees for each sale.
That means that an employee with 30 sales will appear 30 times. Although it is very disturbing, it is the specified behavior (EJB 3.0 persistency, paragraph 4.4.5.3 “Fetch Joins”). You can either manually de-duplicate the parent relation e.g. using a
LinkedHashSet
or use the functiondistinct()
as shown in the example.CriteriaBuilder qb = em.getCriteriaBuilder(); CriteriaQuery<Employees> q = qb.createQuery(Employees.class); Root<Employees> r = q.from(Employees.class); q.where(queryBuilder.like( queryBuilder.upper(r.get(Employees_.lastName)), "WIN%") ); r.fetch("sales", JoinType.LEFT); // needed to avoid duplication of Employee records q.distinct(true); List<Employees> emp = em.createQuery(q).getResultList();
Hibernate 3.6.0 generates the following SQL statement:
select distinct employees0_.subsidiary_id as subsidiary1_0_0_ , employees0_.employee_id as employee2_0_0_ -- MORE COLUMNS , sales1_.sale_id as sale1_0__ from employees employees0_ left outer join sales sales1_ on employees0_.subsidiary_id=sales1_.subsidiary_id and employees0_.employee_id=sales1_.employee_id where upper(employees0_.last_name) like ?
The query has the expected left join but also an unnecessary
distinct
keyword. Unfortunately, JPA does not provide separate API calls to filter duplicated parent entries without de-duplicating the child records as well. Thedistinct
keyword in the SQL query is alarming because most databases will actually filter duplicate records. Only a few databases recognize that the primary keys guarantees uniqueness in that case anyway.The native Hibernate API solves the problem on the client side using a result set transformer:
Criteria c = session.createCriteria(Employees.class); c.add(Restrictions.ilike("lastName", 'Win%')); c.setFetchMode("sales", FetchMode.JOIN); c.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY); List<Employees> result = c.list();
It generates the following query:
select this_.subsidiary_id as subsidiary1_0_1_ , this_.employee_id as employee2_0_1_ -- MORE this_ columns on employees , sales2_.sale_id as sale1_3_ -- MORE sales2_ columns on sales from employees this_ left outer join sales sales2_ on this_.subsidiary_id=sales2_.subsidiary_id and this_.employee_id=sales2_.employee_id where lower(this_.last_name) like ?
This method produces straight SQL without unintended clauses. Note that Hibernate uses
lower()
for case-insensitive queries—an important detail for function-based indexing.- Perl
The following example uses Perl’s DBIx::Class framework:
my @employees = $schema->resultset('Employees') ->search({ 'UPPER(last_name)' => {-like => 'WIN%'} , {prefetch => ['sales']} });
DBIx::Class 0.08192 generates the following SQL statement:
SELECT me.employee_id, me.subsidiary_id, me.last_name -- MORE COLUMNS FROM employees me LEFT JOIN sales sales ON (sales.employee_id = me.employee_id AND sales.subsidiary_id = me.subsidiary_id) WHERE ( UPPER(last_name) LIKE ? ) ORDER BY sales.employee_id, sales.subsidiary_id
Note the
order by
clause—it was not requested by the application. The database has to sort the result set accordingly, and that might take a while.- PHP
The following example uses PHP’s Doctrine framework:
$qb = $em->createQueryBuilder(); $qb->select('e,s') ->from('Employees', 'e') ->leftJoin('e.sales', 's') ->where("upper(e.last_name) like :last_name") ->setParameter('last_name', 'WIN%'); $r = $qb->getQuery()->getResult();
Doctrine 2.0.5 generates the following SQL statement:
SELECT e0_.employee_id AS employee_id0 -- MORE COLUMNS FROM employees e0_ LEFT JOIN sales s1_ ON e0_.subsidiary_id = s1_.subsidiary_id AND e0_.employee_id = s1_.employee_id WHERE UPPER(e0_.last_name) LIKE ?
The execution plan shows the NESTED LOOPS OUTER
operation:
- DB2
Explain Plan --------------------------------------------------------------- ID | Operation | Rows | Cost 1 | RETURN | | 10501 2 | NLJOIN (LEFT) | 5745 of 57 | 10501 3 | FETCH EMPLOYEES | 57 of 57 (100.00%) | 49 4 | RIDSCN | 57 of 57 (100.00%) | 6 5 | SORT (UNIQUE) | 57 of 57 (100.00%) | 6 6 | IXSCAN EMP_NAME | 57 of 10000 ( .57%) | 6 7 | FETCH SALES | 101 of 101 (100.00%) | 183 8 | IXSCAN SALES_SUB_EMP | 101 of 1011118 ( .01%) | 13 Predicate Information 2 - JOIN (Q2.EMPLOYEE_ID = Q3.EMPLOYEE_ID) JOIN (Q2.SUBSIDIARY_ID = Q3.SUBSIDIARY_ID) 3 - SARG (Q1.LAST_NAME LIKE ?) 6 - START ($INTERNAL_FUNC$() <= Q1.LAST_NAME) STOP (Q1.LAST_NAME <= $INTERNAL_FUNC$()) SARG (Q1.LAST_NAME LIKE ?) 8 - START (Q2.SUBSIDIARY_ID = Q3.SUBSIDIARY_ID) START (Q2.EMPLOYEE_ID = Q3.EMPLOYEE_ID) STOP (Q2.SUBSIDIARY_ID = Q3.SUBSIDIARY_ID) STOP (Q2.EMPLOYEE_ID = Q3.EMPLOYEE_ID)
The
UPPER
was removed from thewhere
-clause to get the expected result.- Oracle
--------------------------------------------------------------- |Id |Operation | Name | Rows | Cost | --------------------------------------------------------------- | 0 |SELECT STATEMENT | | 822 | 38 | | 1 | NESTED LOOPS OUTER | | 822 | 38 | | 2 | TABLE ACCESS BY INDEX ROWID| EMPLOYEES | 1 | 4 | |*3 | INDEX RANGE SCAN | EMP_UP_NAME | 1 | | | 4 | TABLE ACCESS BY INDEX ROWID| SALES | 821 | 34 | |*5 | INDEX RANGE SCAN | SALES_EMP | 31 | | --------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 3 - access(UPPER("LAST_NAME") LIKE 'WIN%') filter(UPPER("LAST_NAME") LIKE 'WIN%') 5 - access("E0_"."SUBSIDIARY_ID"="S1_"."SUBSIDIARY_ID"(+) AND "E0_"."EMPLOYEE_ID" ="S1_"."EMPLOYEE_ID"(+))
The database retrieves the result from the EMPLOYEES
table via EMP_UP_NAME
first and fetches the corresponding
records from the SALES
table for each employee
afterwards.
Tip
Get to know your ORM and take control of joins.
Different ORM tools offer different ways to control join behavior. Eager fetching is just one example that is not even provided by every object-relational mapper. It is good practice to implement a small set of samples that explores the capabilities of your ORM. That’s not only a good exercise, it can also serve as reference during development, and will show you unexpected behavior—like duplication of parent records as a side effect of using joins. Download the samples to get started.
The nested loops join delivers good performance if the driving query returns a small result set. Otherwise, the optimizer might choose an entirely different join algorithm—like the hash join described in the next section, but this is only possible if the application uses a join to tell the database what data it actually needs.
Links
Article: “Latency: Security vs. Performance” about network latencies and SQL applications.