Database Indexes & Query Optimisation

Without indexes, every query reads every row. On a table with 100 million rows, that's seconds per query. Indexes are the single biggest lever for database performance — but they have costs too.

The Problem: Full Table Scans

sql
Loading...

Without an index, the database reads every row in orders and checks user_id. With 10 million rows, that's 10 million comparisons — O(n).

With an index on user_id, the database jumps directly to matching rows — O(log n).

B-Tree Indexes

The default index type in every major relational database. A B-Tree (balanced tree) keeps sorted data with O(log n) search, insert, and delete.

B-Tree on orders.user_id42781151–4143–7779–114116+leaf…leaf…leaf…leaf…leaf nodes: (value, row pointer) pairs — linked list for range scans
  • Root and internal nodes: contain key ranges to guide the search
  • Leaf nodes: contain the actual index values and pointers to table rows
  • Leaf nodes linked: enables efficient range scans without going back to root

Searching for user_id = 42:

  1. Start at root — compare 42 to split keys
  2. Follow left pointer to next level
  3. Reach leaf — get row pointer(s)
  4. Fetch rows from the main table

Creating Indexes

sql
Loading...

Composite Index Column Order

The order of columns in a composite index matters enormously. A composite index on (a, b) can answer queries on:

  • WHERE a = ...
  • WHERE a = ... AND b = ...
  • WHERE b = ... ✗ (can't use the index — b isn't the leading column)
sql
Loading...

Rule: put the most selective column first (or the one used in equality conditions before range conditions).

Covering Indexes

A covering index includes all columns a query needs, so the database never has to fetch the actual row — the answer is entirely in the index.

sql
Loading...

How to Check if an Index is Used — EXPLAIN

sql
Loading...
text
Loading...

vs. no index:

text
Loading...

Key things to look for in EXPLAIN output:

  • Seq Scan → full table scan; add an index if this is on a large table
  • Index Scan → good; using index to fetch rows
  • Index Only Scan → best; covering index, no table access
  • Nested Loop / Hash Join → join strategy; large costs may indicate missing index

Index Trade-offs

BenefitCost
Read performanceO(log n) lookups instead of O(n) scansNone
Write performanceNoneEvery INSERT/UPDATE/DELETE also updates the index
StorageNoneIndexes use disk space (can be 20-50% of table size)

Don't index everything. Guidelines:

  • Index columns in WHERE, JOIN ON, ORDER BY clauses of frequent queries
  • Don't index columns with very low cardinality (e.g., a boolean is_active on a 90% active table)
  • Don't index rarely-queried tables — the write overhead isn't worth it
  • Drop unused indexes: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Index Types Beyond B-Tree

TypeBest forDatabase
B-TreeEquality, range, sortAll databases
HashExact equality onlyPostgreSQL, MySQL
GiST / SP-GiSTGeometric, full-text, rangesPostgreSQL
GINFull-text search, arrays, JSONPostgreSQL
ColumnstoreAnalytics, aggregations on many rowsSQL Server, BigQuery

Query Optimisation Tips

sql
Loading...

Key Takeaways

  • Without indexes, queries do full table scans — O(n); with indexes, O(log n)
  • B-Tree is the universal default; covers equality, ranges, and sorting
  • Composite index column order matters — leading column must match your WHERE clause
  • Covering indexes eliminate table row fetches — fastest possible read path
  • Use EXPLAIN ANALYZE to verify the query planner is using your indexes
  • Every index adds write overhead — only index columns you actually query on