Database Indexes & Query Optimisation

Without indexes, every query reads every row. On a table with 100 million rows, that's seconds per query. Indexes are the single biggest lever for database performance — but they have costs too.

The Problem: Full Table Scans#

sql

Loading editor…

Without an index, the database reads every row in orders and checks user_id. With 10 million rows, that's 10 million comparisons — O(n).

With an index on user_id, the database jumps directly to matching rows — O(log n).

B-Tree Indexes#

The default index type in every major relational database. A B-Tree (balanced tree) keeps sorted data with O(log n) search, insert, and delete.

Root and internal nodes: contain key ranges to guide the search
Leaf nodes: contain the actual index values and pointers to table rows
Leaf nodes linked: enables efficient range scans without going back to root

Searching for user_id = 42:

Start at root — compare 42 to split keys
Follow left pointer to next level
Reach leaf — get row pointer(s)
Fetch rows from the main table

Creating Indexes#

sql

Loading editor…

Composite Index Column Order#

The order of columns in a composite index matters enormously. A composite index on (a, b) can answer queries on:

WHERE a = ... ✓
WHERE a = ... AND b = ... ✓
WHERE b = ... ✗ (can't use the index — b isn't the leading column)

sql

Loading editor…

Rule: put the most selective column first (or the one used in equality conditions before range conditions).

Covering Indexes#

A covering index includes all columns a query needs, so the database never has to fetch the actual row — the answer is entirely in the index.

sql

Loading editor…

How to Check if an Index is Used — EXPLAIN#

sql

Loading editor…

text

Loading editor…

vs. no index:

text

Loading editor…

Key things to look for in EXPLAIN output:

Seq Scan → full table scan; add an index if this is on a large table
Index Scan → good; using index to fetch rows
Index Only Scan → best; covering index, no table access
Nested Loop / Hash Join → join strategy; large costs may indicate missing index

Index Trade-offs#

	Benefit	Cost
Read performance	O(log n) lookups instead of O(n) scans	None
Write performance	None	Every INSERT/UPDATE/DELETE also updates the index
Storage	None	Indexes use disk space (can be 20-50% of table size)

Don't index everything. Guidelines:

Index columns in WHERE, JOIN ON, ORDER BY clauses of frequent queries
Don't index columns with very low cardinality (e.g., a boolean is_active on a 90% active table)
Don't index rarely-queried tables — the write overhead isn't worth it
Drop unused indexes: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Index Types Beyond B-Tree#

Type	Best for	Database
B-Tree	Equality, range, sort	All databases
Hash	Exact equality only	PostgreSQL, MySQL
GiST / SP-GiST	Geometric, full-text, ranges	PostgreSQL
GIN	Full-text search, arrays, JSON	PostgreSQL
Columnstore	Analytics, aggregations on many rows	SQL Server, BigQuery

Query Optimisation Tips#

sql

Loading editor…

Key Takeaways#

Without indexes, queries do full table scans — O(n); with indexes, O(log n)
B-Tree is the universal default; covers equality, ranges, and sorting
Composite index column order matters — leading column must match your WHERE clause
Covering indexes eliminate table row fetches — fastest possible read path
Use EXPLAIN ANALYZE to verify the query planner is using your indexes
Every index adds write overhead — only index columns you actually query on

Database Indexes & Query Optimisation

Without indexes, every query reads every row. On a table with 100 million rows, that's seconds per query. Indexes are the single biggest lever for database performance — but they have costs too.

The Problem: Full Table Scans#

sql

Loading editor…

Without an index, the database reads every row in orders and checks user_id. With 10 million rows, that's 10 million comparisons — O(n).

With an index on user_id, the database jumps directly to matching rows — O(log n).

B-Tree Indexes#

The default index type in every major relational database. A B-Tree (balanced tree) keeps sorted data with O(log n) search, insert, and delete.

Root and internal nodes: contain key ranges to guide the search
Leaf nodes: contain the actual index values and pointers to table rows
Leaf nodes linked: enables efficient range scans without going back to root

Searching for user_id = 42:

Start at root — compare 42 to split keys
Follow left pointer to next level
Reach leaf — get row pointer(s)
Fetch rows from the main table

Creating Indexes#

sql

Loading editor…

Composite Index Column Order#

The order of columns in a composite index matters enormously. A composite index on (a, b) can answer queries on:

WHERE a = ... ✓
WHERE a = ... AND b = ... ✓
WHERE b = ... ✗ (can't use the index — b isn't the leading column)

sql

Loading editor…

Rule: put the most selective column first (or the one used in equality conditions before range conditions).

Covering Indexes#

A covering index includes all columns a query needs, so the database never has to fetch the actual row — the answer is entirely in the index.

sql

Loading editor…

How to Check if an Index is Used — EXPLAIN#

sql

Loading editor…

text

Loading editor…

vs. no index:

text

Loading editor…

Key things to look for in EXPLAIN output:

Seq Scan → full table scan; add an index if this is on a large table
Index Scan → good; using index to fetch rows
Index Only Scan → best; covering index, no table access
Nested Loop / Hash Join → join strategy; large costs may indicate missing index

Index Trade-offs#

	Benefit	Cost
Read performance	O(log n) lookups instead of O(n) scans	None
Write performance	None	Every INSERT/UPDATE/DELETE also updates the index
Storage	None	Indexes use disk space (can be 20-50% of table size)

Don't index everything. Guidelines:

Index columns in WHERE, JOIN ON, ORDER BY clauses of frequent queries
Don't index columns with very low cardinality (e.g., a boolean is_active on a 90% active table)
Don't index rarely-queried tables — the write overhead isn't worth it
Drop unused indexes: SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0;

Index Types Beyond B-Tree#

Type	Best for	Database
B-Tree	Equality, range, sort	All databases
Hash	Exact equality only	PostgreSQL, MySQL
GiST / SP-GiST	Geometric, full-text, ranges	PostgreSQL
GIN	Full-text search, arrays, JSON	PostgreSQL
Columnstore	Analytics, aggregations on many rows	SQL Server, BigQuery

Query Optimisation Tips#

sql

Loading editor…

Key Takeaways#

Without indexes, queries do full table scans — O(n); with indexes, O(log n)
B-Tree is the universal default; covers equality, ranges, and sorting
Composite index column order matters — leading column must match your WHERE clause
Covering indexes eliminate table row fetches — fastest possible read path
Use EXPLAIN ANALYZE to verify the query planner is using your indexes
Every index adds write overhead — only index columns you actually query on