Query Processing & the Query Optimizer
When you write a SQL query, the database doesn't execute it literally. It runs the SQL through a multi-stage pipeline — parsing, rewriting, planning, and only then executing. The query optimizer is the component that transforms your SQL into the most efficient execution strategy it can find. Understanding this pipeline helps you write faster queries and diagnose slow ones.
The Query Lifecycle
Stage 1: Query Parsing
The parser converts raw SQL text into a structured parse tree.
What the Parser Does
-
Tokenisation: splits the text into keywords, identifiers, operators, literals
SELECT,u,.,name,,,COUNT,(,o.id,), ...
-
Syntax tree: validates grammar and builds an AST (Abstract Syntax Tree)
textLoading... -
Semantic analysis (binding phase): replaces table/column names with internal object IDs from the system catalog. Validates that
users.nameexists, thato.user_idis the right type foru.id, and that the user has SELECT permission.
Errors at this stage: relation "userz" does not exist, column "naem" does not exist, operator does not exist: text > integer.
Stage 2: Query Rewriting
Before planning, the rewriter applies a set of algebraic transformations that simplify the query or open up more optimisation opportunities.
Subquery Unnesting
Correlated subqueries are expensive — they run once per outer row. The rewriter often transforms them into joins.
Predicate Pushdown
Move WHERE conditions as close to the data source as possible, filtering out rows early.
View Expansion
Stage 3: Query Planning and Optimization
The planner generates multiple candidate execution plans and estimates the cost of each. It then picks the one with the lowest estimated cost.
Rule-Based vs Cost-Based Optimization
| Approach | How it works | Weakness |
|---|---|---|
| Rule-based (RBO) | Apply fixed transformation rules (e.g., always use index if one exists) | Ignores data distribution, can choose a bad plan |
| Cost-based (CBO) | Use statistics to estimate row counts and I/O, pick minimum cost plan | Only as good as the statistics |
Modern databases (PostgreSQL, MySQL 8+, SQL Server, Oracle) use cost-based optimization with some rule-based heuristics.
Statistics: How the Planner Estimates Row Counts
The planner needs to estimate how many rows each operation will return. It uses statistics collected by ANALYZE (PostgreSQL) or ANALYZE TABLE (MySQL).
Example output:
The planner uses this to estimate: WHERE status = 'pending' will return approximately 18% of rows.
Histograms
For columns with non-uniform distributions, PostgreSQL builds histograms:
Access Paths
Once the planner knows approximately how many rows to expect, it chooses how to physically access the data.
Sequential Scan
Read every page of the table from disk. Unavoidable when:
- No suitable index exists
- The query returns a large fraction of the table (>5-20%)
Index Scan
Traverse the B-tree index, then fetch each matching heap row.
Good for low-selectivity queries (few matching rows). Degrades if many rows match because random I/O is expensive.
Index-Only Scan
All columns needed by the query are in the index. No heap access required.
Fastest access path when applicable. Requires the index to cover all columns referenced in SELECT and WHERE.
Bitmap Index Scan
For queries returning a moderate number of rows (more than index scan is efficient for, less than seqscan):
Can combine multiple bitmaps: BitmapAnd, BitmapOr for multi-condition queries.
Join Strategies
When joining two tables, the planner chooses one of three algorithms based on table sizes, available indexes, and sort order.
Nested Loop Join
For each row in the outer table, scan the inner table.
Hash Join
Build a hash table from the smaller table, then probe it with every row from the larger table.
Merge Join
Sort both tables on the join key, then merge them like merge sort.
| Join Type | Best For | Memory | Requires |
|---|---|---|---|
| Nested Loop | Small outer, indexed inner | Low | Index on inner join key |
| Hash Join | Large tables, no index | High (hash table) | Enough work_mem |
| Merge Join | Pre-sorted inputs | Low | Sorted input or indexes |
Reading EXPLAIN ANALYZE Output
EXPLAIN ANALYZE runs the query and shows the actual execution plan with real row counts and timing.
Example output:
How to Read It
- cost=X..Y: estimated cost. X = startup cost (before first row), Y = total cost. Units are arbitrary but consistent within one EXPLAIN.
- rows=N: estimated rows (planner's guess). Compare to
actual rows=Nto see if estimates were accurate. - actual time=X..Y: actual milliseconds. X = time to first row, Y = total time.
- loops=N: how many times this node ran.
actual timeis per loop; total =actual time * loops. - Filter: ... Rows Removed by Filter: shows how many rows were discarded at this node.
Common Plan Problems
Problem 1: Wrong row estimate (cardinality error)
The planner thought 10 rows would match, picked nested loop join. Actually 85,000 rows — should have been a hash join. Fix: ANALYZE the table to refresh statistics.
Problem 2: Seq scan on a large table
Only 60 rows matched out of 100,000 scanned. An index on user_id would change this to an index scan. Fix: CREATE INDEX ON orders(user_id).
Problem 3: Index exists but not used
Fix: create a functional index: CREATE INDEX ON orders(LOWER(status));
VACUUM ANALYZE: Keeping Statistics Fresh
PostgreSQL's MVCC (multi-version concurrency control) leaves dead row versions after updates and deletes. VACUUM reclaims this space. ANALYZE refreshes the planner statistics.
pg_hint_plan
When the planner consistently makes bad choices, you can override with hints (requires the pg_hint_plan extension):
Prepared Statements
A prepared statement is parsed, bound, and planned once, then executed many times with different parameters.
PostgreSQL uses generic plans after 5 executions: a plan that works for any parameter value. This avoids per-parameter planning overhead but may not be optimal for skewed data.
Python: Running EXPLAIN ANALYZE
Summary
- Query lifecycle: parse → bind → rewrite → plan → execute. The optimizer operates at the plan phase.
- Statistics: the planner's accuracy depends on up-to-date
pg_stats. RunANALYZEafter bulk changes. - Access paths: sequential scan for large fractions of a table; index scan for selective predicates; index-only scan when all columns are in the index.
- Join strategies: nested loop (small + indexed), hash join (large, no index), merge join (pre-sorted). The planner chooses based on estimated row counts.
- EXPLAIN ANALYZE: always read estimated vs actual rows. A large discrepancy is a sign of stale statistics or missing statistics.
- Prepared statements: parse once, execute many times — essential for high-throughput OLTP.
- VACUUM ANALYZE: dead rows skew statistics and waste I/O. Autovacuum handles routine cases; run manually after bulk operations.