2011-09-18Developers Need to Index
SQL Performance problems are as old as SQL itself. Some might even say that SQL is inherently slow. Although it might have been true in the early days of SQL, it is definitely not true anymore. Nevertheless SQL performance problems are everywhere, everyday. How does this happen?
The SQL language is perhaps the most successful fourth-generation programming language (4GL). Its main benefit is the separation of “what” and “how”. An SQL statement is a straight description what is needed without instructions how to get it done. Consider the following example:
SELECT date_of_birth FROM employees WHERE last_name = 'WINAND'
The SQL query reads like an English sentence that explains the requested data. Writing SQL statements does generally not need any knowledge about inner workings of the database, the storage system (such as disks, files, ...), or the like. There is no need to instruct the database which files to open or how to find the requested rows. Many developers have years of SQL experience, yet they know very little about the processing in the database.
The separation of concerns—what is needed versus how to get it—works remarkably well in SQL, but it is still not perfect. The abstraction reaches its limits when it comes to performance: The author of an SQL statement does per definition not care how the database executes the statement. Consequently, the author is not responsible for a slow execution. However, experience proves the opposite; the author must know a little bit about the database to prevent performance problems.
It turns out that the only thing developers need to know is how to index. Database indexing is, in fact, a development task. That is because the most important information for proper indexing is not the storage system configuration or the hardware setup. The most important information for indexing is how the application queries the data. This knowledge—about the access path—is hardly accessible to database administrators (DBAs) or external consultants. They need good reverse engineering skills and quite a lot of time to gather this information. Development, on the other hand, knows the data and access path anyway.
This book covers everything developers need to know about indexes—and nothing more. To be more precise, the book covers the most important index type only: the B-Tree index.
The B-Tree index works almost identical in various databases. The book uses the terms of the Oracle database only, but the principles apply to other databases as well. Side notes provide relevant information for MySQL, PostgreSQL and Microsoft SQL Server.
The structure of the book is tailor-made for developers; most of the chapters correspond to a specific part of an SQL statement.
- CHAPTER 1 - Anatomy of an Index
The first chapter is the only one that doesn’t cover SQL; it is about the fundamental structure of an index. The understanding of the index structure is essential to follow the later chapters—don’t skip this.
Although the chapter is rather short—about 4 printed pages—you will already understand the phenomenon of slow indexes after working through the chapter.
- CHAPTER 2 - The Where Clause
This is where we pull out all the stops. This chapter explains all aspects of the where clause; beginning with very simple single column lookups down to complex clauses for ranges and special cases like
NULL.This chapter makes up the main body of the book. Once you learn to use these techniques, you will write much faster SQL.
- CHAPTER 3 - Testing and Scalability
This chapter is a little digression about a performance phenomenon that hits developers very often. It explains the performance differences between development and production databases and covers the effects of growing data volumes.
- CHAPTER 4 - The Join Operation
Back to SQL: how to use indexes to perform a fast table join?
- CHAPTER 5 - Clustering Data
Have you ever wondered if there is any difference between selecting a single column or all columns? Here is the answer—along with a trick to get even better performance.
- CHAPTER 6 - Sorting and Grouping
How order by and group by benefit from an index.
- CHAPTER 7 - Partial Results
This chapter explains how to take benefit from pipelined execution when you don’t need the full result set.
- CHAPTER 8 - Insert, Delete and Update (not yet published)
How do indexes affect data changes? An index doesn’t come for free—use them wisely!
- APPENDIX A - Execution Plans
Asking the database how it executes a statement.
- APPENDIX B - Myth Directory
Lists some common myth and explains the truth. Will be extended as the book grows.
- APPENDIX C - Example Schema
All create and insert statements for the tables from the book.
share and subscribe
RSS FeedFlattr this! Follow me on TwitterShare at Google+Like on Facebook