Database Indexes & Query Optimisation
Without indexes, every query reads every row. On a table with 100 million rows, that's seconds per query. Indexes are the single biggest lever for database performance — but they have costs too.
The Problem: Full Table Scans
Without an index, the database reads every row in orders and checks user_id. With 10 million rows, that's 10 million comparisons — O(n).
With an index on user_id, the database jumps directly to matching rows — O(log n).
B-Tree Indexes
The default index type in every major relational database. A B-Tree (balanced tree) keeps sorted data with O(log n) search, insert, and delete.
- Root and internal nodes: contain key ranges to guide the search
- Leaf nodes: contain the actual index values and pointers to table rows
- Leaf nodes linked: enables efficient range scans without going back to root
Searching for user_id = 42:
- Start at root — compare 42 to split keys
- Follow left pointer to next level
- Reach leaf — get row pointer(s)
- Fetch rows from the main table
Creating Indexes
Composite Index Column Order
The order of columns in a composite index matters enormously. A composite index on (a, b) can answer queries on:
WHERE a = ...✓WHERE a = ... AND b = ...✓WHERE b = ...✗ (can't use the index —bisn't the leading column)
Rule: put the most selective column first (or the one used in equality conditions before range conditions).
Covering Indexes
A covering index includes all columns a query needs, so the database never has to fetch the actual row — the answer is entirely in the index.
How to Check if an Index is Used — EXPLAIN
vs. no index:
Key things to look for in EXPLAIN output:
Seq Scan→ full table scan; add an index if this is on a large tableIndex Scan→ good; using index to fetch rowsIndex Only Scan→ best; covering index, no table accessNested Loop / Hash Join→ join strategy; large costs may indicate missing index
Index Trade-offs
| Benefit | Cost | |
|---|---|---|
| Read performance | O(log n) lookups instead of O(n) scans | None |
| Write performance | None | Every INSERT/UPDATE/DELETE also updates the index |
| Storage | None | Indexes use disk space (can be 20-50% of table size) |
Don't index everything. Guidelines:
- Index columns in
WHERE,JOIN ON,ORDER BYclauses of frequent queries - Don't index columns with very low cardinality (e.g., a boolean
is_activeon a 90% active table) - Don't index rarely-queried tables — the write overhead isn't worth it
- Drop unused indexes:
SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;
Index Types Beyond B-Tree
| Type | Best for | Database |
|---|---|---|
| B-Tree | Equality, range, sort | All databases |
| Hash | Exact equality only | PostgreSQL, MySQL |
| GiST / SP-GiST | Geometric, full-text, ranges | PostgreSQL |
| GIN | Full-text search, arrays, JSON | PostgreSQL |
| Columnstore | Analytics, aggregations on many rows | SQL Server, BigQuery |
Query Optimisation Tips
Key Takeaways
- Without indexes, queries do full table scans — O(n); with indexes, O(log n)
- B-Tree is the universal default; covers equality, ranges, and sorting
- Composite index column order matters — leading column must match your WHERE clause
- Covering indexes eliminate table row fetches — fastest possible read path
- Use
EXPLAIN ANALYZEto verify the query planner is using your indexes - Every index adds write overhead — only index columns you actually query on