Nested Loops


Applies to
MySQLYes
OracleYes
PostgreSQLYes
SQL ServerYes

The nested loops join is the most fundamental join algorithm. It is implemented like using two select queries: one to fetch the results from the first table—the driving query—and another one on the second table for each row from the driving query.

The nested loops algorithm can be implemented on the client side as well—using “nested selects”. But that is a problematic approach because the network latencies occur on top of the disk latencies—making the overall response time even worse. “Nested selects” are still very common because they might be implemented without being noticed. Especially object-relational mapping (ORM) tools use nested selects often—the so-called N+1 selects problem.

It's a book!
You are just reading a book. Here is the table of content

The following examples show these “accidental nested select” joins with different ORM tools. The samples search the EMPLOYEES table for entries where upper(last_name) begins with 'WIN' and fetch all SALES records for the selected employees.

Java

The JPA example uses the CriteriaBuilder interface.

CriteriaBuilder queryBuilder = em.getCriteriaBuilder();
CriteriaQuery<Employees>
   query = queryBuilder.createQuery(Employees.class);
Root<Employees> r = query.from(Employees.class); 
query.where(
  queryBuilder.like(
    queryBuilder.upper(r.get(Employees_.lastName)),
    "WIN%"
  )
);

List<Employees> emp = em.createQuery(query).getResultList();

for (Employees e: emp) {
  // process Employee
  for (Sales s: e.getSales()) {
    // process sale for Employee
  }
}

Hibernate JPA 3.6.0 generates the following SQL statements (N+1 selects):

select employees0_.subsidiary_id as subsidiary1_0_
       -- MORE COLUMNS
  from employees employees0_ 
 where upper(employees0_.last_name) like ?

  select sales0_.subsidiary_id as subsidiary4_0_1_
         -- MORE COLUMNS
    from sales sales0_
   where sales0_.subsidiary_id=? 
     and sales0_.employee_id=?

  select sales0_.subsidiary_id as subsidiary4_0_1_
         -- MORE COLUMNS
    from sales sales0_
   where sales0_.subsidiary_id=? 
     and sales0_.employee_id=?
Perl

The following sample demonstrates Perl’s DBIx::Class 0.08192 framework:

my @employees = 
   $schema->resultset('Employees')
          ->search({'UPPER(last_name)' => {-like=>'WIN%'}});

foreach my $employee (@employees) {
   # process Employee
   foreach my $sale ($employee->sales) {
      # process Sale for Employee
   }
}

DBIx::Class generates the following SQL statements (N+1 selects):

SELECT me.employee_id, me.subsidiary_id
     , me.last_name, me.first_name, me.date_of_birth 
  FROM employees me 
 WHERE ( UPPER(last_name) LIKE ? )


   SELECT me.sale_id, me.employee_id, me.subsidiary_id
        , me.sale_date, me.eur_value
     FROM sales me
    WHERE ( ( me.employee_id = ? 
      AND me.subsidiary_id = ? ) )

   SELECT me.sale_id, me.employee_id, me.subsidiary_id
        , me.sale_date, me.eur_value
     FROM sales me
    WHERE ( ( me.employee_id = ? 
      AND me.subsidiary_id = ? ) )
PHP

The Doctrine 2.0.5 sample uses the query builder interface:

$qb = $em->createQueryBuilder();
$qb->select('e')
   ->from('Employees', 'e')
   ->where("upper(e.last_name) like :last_name")
   ->setParameter('last_name', 'WIN%');
$r = $qb->getQuery()->getResult();

foreach ($r as $row) {
   // process Employee
   foreach ($row->getSales() as $sale) {
      // process Sale for Employee
   }
}

Doctrine generates the following SQL statements (N+1 selects):

SELECT e0_.employee_id AS employee_id0 -- MORE COLUMNS
  FROM employees e0_
 WHERE UPPER(e0_.last_name) LIKE ?

   SELECT t0.sale_id AS SALE_ID1 -- MORE COLUMNS
     FROM sales t0 
    WHERE t0.subsidiary_id = ? 
      AND t0.employee_id = ?

   SELECT t0.sale_id AS SALE_ID1 -- MORE COLUMNS
     FROM sales t0 
    WHERE t0.subsidiary_id = ? 
      AND t0.employee_id = ?

The ORMs don’t generate SQL joins—they query the SALES table separately. The effect is known as the “N+1 problem” or “N+1 selects problem” because N+1 is the total number of executed SQL statements if the first select returns N rows.

Tweet this tip

Enabling SQL-Logging

Enable SQL logging during development and review the generated SQL statements.

DBIx::Class

export DBIC_TRACE=1 in your shell.

Doctrine

Only programmatically—don’t forget to disable for production. Consider building a custom, configurable logger.

$logger = new \Doctrine\DBAL\Logging\EchoSqlLogger;
$config->setSQLLogger($logger);
Hibernate (native)

<property name="show_sql">true</property> in hibernate.cfg.xml

JPA

In persistence.xml but depending on the JPA provider—e.g., for eclipselink, Hibernate and OpenJPA:

<property name="eclipselink.logging.level" value="FINE"/>
<property name="hibernate.show_sql" value="TRUE"/>
<property name="openjpa.Log" value="SQL=TRACE"/>
NHibernate

<property name="show_sql">true</property> in App.config

Most ORMs provide a programmatic way to enable SQL logging as well. That bears the risk to deploy the setting to production.

Even though the “nested select” approach should be avoided, it explains the nested loops join pretty well. The database executes the join like the ORM tools do. That means that indexing for a nested loops join is like indexing for the queries above. That is, a function based index on EMPLOYEES, and a concatenated index for the join predicates on the SALES table:

CREATE INDEX emp_up_name ON employees (UPPER(last_name));
CREATE INDEX sales_emp ON sales (subsidiary_id, employee_id);

An SQL join is more efficient than the nested select approach—even though it uses the same indexes and predicates. That is partly because the database applies internal optimizations like caching index root nodes, but mostly because of the reduced network activity. It is even true if the join results in more data if the employee attributes are duplicated for each sale. Remember that performance has two dimensions: response time and throughput. They are called latency and bandwidth in networking context. Bandwidth is, typically, not affecting response time—but latency is. That means that the number of database round trips is more important than the transmitted data volume.

Tweet this tip

Tip

Executing joins in the database is the most important tuning step when using ORM tools.

Most ORM tools offer different ways to create an SQL join. The so-called eager fetching mode is probably the most important. It is typically configured in the entity mappings along with the join predicates. That means it can be enabled on property level, e.g., for the employees property in the Sales class so that the ORM tool joins the EMPLOYEES table whenever searching for SALES. Configuring eager fetching in the entity mappings is very convenient if the employee details are almost always required along with sales data.

Eager fetching is counter productive if the child records are accessed rarely. The telephone directory application, for example, doesn’t need sales data. It doesn’t make sense to eagerly fetch sales along with the employees. Fetching employees may or may not need the related sales data—a static configuring is no solution. Derived classes can provide the required flexibility, e.g., by introducing a new class, SalesPerson, that fetches sales eagerly. The Employees class would not need the sales property anymore—thus, preventing the N+1 problem entirely.

Tweet this tip

Tip

Get to know your ORM and take control of joins.

Different ORM tools offer different ways to control join behaviour. Eager fetching is just one example that is not even provided by every object-relational mapper. It is good practice to implement a small set of samples that explores the capabilities of your ORM. That’s not only a good exercise, it can also serve as reference during development, and will show you unexpected behaviour—like duplication of parent records as a side effect of using joins. Download the samples to get started.

Controlling joins is essential for performance. The following samples demonstrate how to control joining at runtime, providing great flexibility:

Java

The JPA CriteriaBuilder interface provides the Root<>.fetch() method for joins. It allows to specify which client relation should be fetched and what kind of join operation is required. In that sample, we need a left join to retrieve all employees, regardless if they have sales or not.

Warning

JPA, and Hibernate, return every employee for each sale.

That means, an employee with 30 sales will appear 30 times. That is very disturbing, but defined behaviour (EJB 3.0 persistency, paragraph 4.4.5.3 “Fetch Joins”). You can either manually de-duplicate the parent relation or use distinct(), as shown in the sample.

CriteriaBuilder qb = em.getCriteriaBuilder();
CriteriaQuery<Employees> q = qb.createQuery(Employees.class);
Root<Employees> r = q.from(Employees.class); 
query.where(queryBuilder.like(
    queryBuilder.upper(r.get(Employees_.lastName)),
    "WIN%")
);

r.fetch("sales", JoinType.LEFT);
// needed to avoid duplication of Employee records
query.distinct(true);

List<Employees> emp = em.createQuery(query).getResultList();

Hibernate 3.6.0 generates the following SQL statements:

select distinct 
       employees0_.subsidiary_id as subsidiary1_0_0_
     , employees0_.employee_id as employee2_0_0_
       -- MORE COLUMNS
     , sales1_.sale_id as sale1_0__
  from employees employees0_
  left outer join sales sales1_ 
          on employees0_.subsidiary_id=sales1_.subsidiary_id
         and employees0_.employee_id=sales1_.employee_id 
 where upper(employees0_.last_name) like ?

The query has the expected left join but adds the distinct keyword as well. JPA does, unfortunately, not provide separate API calls to de-duplicate the parent entity without de-duplicating the child records as well. The distinct keyword is, however, alarming because most databases will actually perform it. Only a few databases recognize that the primary keys guarantee uniqueness in that case.

The native Hibernate API solves the problem differently, with a result set transformer:

Criteria c = session.createCriteria(Employees.class);
c.add(Restrictions.ilike("lastName", 'Win%'));

c.setFetchMode("sales", FetchMode.JOIN);
c.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

List<Employees> result = c.list();

It generates the following query:

select this_.subsidiary_id as subsidiary1_0_1_
     , this_.employee_id as employee2_0_1_
       -- MORE this_ columns on employees
     , sales2_.sale_id as sale1_3_
       -- MORE sales2_ columns on sales
  from employees this_ 
  left outer join sales sales2_ 
          on this_.subsidiary_id=sales2_.subsidiary_id 
         and this_.employee_id=sales2_.employee_id 
 where lower(this_.last_name) like ?

This method produces straight SQL without unintended clauses. But note that Hibernate uses lower() for case insensitive queries—an important detail for function based indexing.

Perl

The following sample demonstrates Perl’s DBIx::Class 0.08192 framework:

my @employees = 
   $schema->resultset('Employees')
          ->search({ 'UPPER(last_name)' => {-like => 'WIN%'}
                   , {prefetch => ['sales']}
                   });

DBIx::Class generates the following SQL statements:

SELECT me.employee_id, me.subsidiary_id, me.last_name
       -- MORE COLUMNS
  FROM employees me 
  LEFT JOIN sales sales 
         ON (sales.employee_id   = me.employee_id 
        AND  sales.subsidiary_id = me.subsidiary_id) 
 WHERE ( UPPER(last_name) LIKE ? )
 ORDER BY sales.employee_id, sales.subsidiary_id

Note the ORDER BY clause. It was not requested by the application. The database has to sort the result set accordingly. That takes some time and resources in the database.

PHP

The following sample demonstrates PHP’s Doctrine 2.0.5 framework:

$qb = $em->createQueryBuilder();
$qb->select('e,s')
   ->from('Employees', 'e')
   ->leftJoin('e.sales', 's')
   ->where("upper(e.last_name) like :last_name")
   ->setParameter('last_name', 'WIN%');
$r = $qb->getQuery()->getResult();

Doctrine generates the following SQL statements:

SELECT e0_.employee_id AS employee_id0
       -- MORE COLUMNS
  FROM employees e0_ 
  LEFT JOIN sales s1_ 
         ON e0_.subsidiary_id = s1_.subsidiary_id 
        AND e0_.employee_id = s1_.employee_id 
 WHERE UPPER(e0_.last_name) LIKE ?

The execution plan shows the NESTED LOOPS operation:

---------------------------------------------------------------
|Id |Operation                    | Name        | Rows | Cost |
---------------------------------------------------------------
| 0 |SELECT STATEMENT             |             |  822 |   38 |
| 1 | NESTED LOOPS OUTER          |             |  822 |   38 |
| 2 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES   |    1 |    4 |
|*3 |   INDEX RANGE SCAN          | EMP_UP_NAME |    1 |      |
| 4 |  TABLE ACCESS BY INDEX ROWID| SALES       |  821 |   34 |
|*5 |   INDEX RANGE SCAN          | SALES_EMP   |   31 |      |
---------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
  3 - access(UPPER("LAST_NAME") LIKE 'WIN%')
      filter(UPPER("LAST_NAME") LIKE 'WIN%')
  5 - access("E0_"."SUBSIDIARY_ID"="S1_"."SUBSIDIARY_ID"(+)
        AND  "E0_"."EMPLOYEE_ID"  ="S1_"."EMPLOYEE_ID"(+))

The database retrieves the result from the first table first (EMPLOYEES via EMP_UP_NAME) and fetches the corresponding records from the second table (SALES via SALES_EMP).

The nested loops join delivers good performance if the driving query returns a small result set. The optimizer will take the smaller result set as driving query automatically. Moreover, the optimizer might choose an entirely different join algorithm for larger result sets—like the hash join described in the next section. But that is only possible if the application uses an SQL join to query all data in one shot.

Recent Questions at Ask.Use-The-Index-Luke.com

0
votes
1
answer
229
views

query regd the CBO decision

Apr 17 at 10:27 Hulda(suspended)
index-choice optimizer
0
votes
3
answers
2.0k
views

Examples for Function Based Indexes?

Mar 25 at 15:52 Castorp 1
function-based
0
votes
1
answer
610
views

Updating multiple rows using a subquery in SQL

Jan 08 at 09:52 Jan 26
subquery update sql