NoSQL Databases — Document, Key-Value, Column & Graph

Relational databases ruled for decades. But as web scale grew — billions of users, flexible JSON payloads, globally distributed writes — relational systems hit hard limits. NoSQL doesn't replace SQL; it solves different problems. Understanding which NoSQL model fits which problem is the skill.

Why NoSQL? Limitations of Relational Databases#

The Rigid Schema Problem#

Relational databases require a fixed schema defined up front. Adding a column to a table with 500 million rows is a multi-hour (or multi-day) operation with locks and table rewrites.

sql

Loading editor…

In a document database, each document can have different fields. No migration needed.

Horizontal Scaling#

RDBMS scale vertically — bigger CPU, more RAM, faster disks. This works until it doesn't. A single MySQL server tops out around a few TB of storage and hundreds of thousands of QPS under typical workloads.

NoSQL systems were designed to scale horizontally — add more commodity machines. Each shard holds a slice of data. 10x the load? Add more nodes.

JSON Data Everywhere#

Modern APIs return JSON. Storing nested JSON in relational tables requires either complex joins across multiple tables or using a JSONB column (which is essentially a document store bolted onto Postgres).

CAP Theorem: Pick Two#

The CAP theorem states a distributed system can guarantee at most two of three properties simultaneously:

Consistency (C): Every read receives the most recent write (or an error)
Availability (A): Every request receives a response (not necessarily the most recent data)
Partition Tolerance (P): The system continues operating despite network partitions

Network partitions will happen. So real systems choose between CP (sacrifice some availability) or AP (sacrifice strict consistency).

CAP in Practice#

System	CAP Choice	Trade-off
PostgreSQL	CA	Not designed for distributed partitions
MongoDB	CP or AP	Configurable via write concern
Cassandra	AP	Eventually consistent by default
HBase	CP	Strong consistency, less available under partition
Redis	CP (primary)	Primary is authoritative
DynamoDB	AP or CP	Eventual or strong consistency per request
Zookeeper	CP	Used for coordination, not storage

Document Stores — MongoDB#

A document store saves data as self-describing documents — usually JSON or BSON. No joins required; related data is embedded in the document.

json

Loading editor…

Core Operations#

Insert:

javascript

Loading editor…

Find (Query):

javascript

Loading editor…

Update:

javascript

Loading editor…

Aggregation Pipeline:

javascript

Loading editor…

When to Use MongoDB#

Content management systems with varied article metadata
Product catalogs (different products have different attributes)
User profiles where each user has different optional fields
Mobile app backends with evolving schemas
Real-time analytics where you're storing raw events

When NOT to Use MongoDB#

Complex multi-entity transactions (e.g., banking transfers)
Heavy aggregation across many relationships (SQL wins here)
When you need strong ACID guarantees across multiple collections

Key-Value Stores — Redis#

Redis stores data as key-value pairs entirely in memory (with optional persistence). It supports rich data structures beyond simple strings.

Data Types#

Strings (GET/SET):

bash

Loading editor…

Lists (LPUSH/LRANGE):

bash

Loading editor…

Sorted Sets (ZADD/ZRANGE) — Leaderboards:

bash

Loading editor…

Hashes:

bash

Loading editor…

Sets:

bash

Loading editor…

Pub/Sub basics:

bash

Loading editor…

Use Cases#

Use Case	Redis Feature	Pattern
Session storage	Strings + TTL	`SET session:<token> <data> EX 3600`
Rate limiting	INCR + TTL	count requests per minute per IP
Caching DB results	Strings + TTL	cache SQL query results
Leaderboards	Sorted Sets	ZADD/ZREVRANGE
Job queues	Lists	LPUSH to enqueue, RPOP to dequeue
Real-time chat	Pub/Sub	PUBLISH/SUBSCRIBE
Unique counts	HyperLogLog	PFADD/PFCOUNT (approximate)

Column-Family Stores — Cassandra / HBase#

Column-family stores organise data by rows and columns, but unlike relational databases, each row can have a different set of columns. They are optimised for writes and wide table scans.

Data Model#

text

Loading editor…

The partition key determines which node holds the data. The clustering key determines the sort order within a partition.

CQL Queries (Cassandra Query Language)#

sql

Loading editor…

Key Characteristic: Denormalise for Queries#

In Cassandra, you design tables around your queries, not your data model. If you need to query by user AND by event_type, you create two separate tables — each optimised for one access pattern.

Use Cases#

Time-series data: sensor readings, stock prices, application metrics
IoT at scale: millions of devices writing events per second
User activity logs: every click, impression, or event across billions of users
Messaging systems: storing chat message history at scale

Graph Databases — Neo4j#

Graph databases store data as nodes (entities), edges (relationships), and properties on both. They shine when relationships between data are as important as the data itself.

Cypher Query Language#

Create nodes and relationships:

cypher

Loading editor…

Query — find friends of friends:

cypher

Loading editor…

Recommendation — "People who bought X also bought":

cypher

Loading editor…

Fraud detection — find circular transaction patterns:

cypher

Loading editor…

Use Cases#

Social networks: follows, likes, friend-of-friend queries
Fraud detection: circular money flows, unusual relationship patterns
Recommendation engines: collaborative filtering, content-based recommendations
Knowledge graphs: entities and their relationships in large ontologies
Access control: role hierarchies, permission inheritance

SQL vs NoSQL Decision Guide#

Factor	Choose SQL	Choose NoSQL
Schema	Fixed, well-defined	Evolving, flexible
Relationships	Complex, many joins	Few or embedded
Transactions	Multi-table ACID required	Single-entity ops fine
Consistency	Strong required	Eventual acceptable
Scale	Vertical scaling adequate	Horizontal scale needed
Query pattern	Ad hoc, flexible queries	Known, repeated access patterns
Team familiarity	SQL expertise	NoSQL expertise
Data shape	Tabular rows	JSON, graphs, time-series

Choosing the Right NoSQL Model#

Data Shape / Access Pattern	Use
JSON objects with nested data	MongoDB (document)
Key lookups, caching, sessions	Redis (key-value)
Billions of time-ordered rows	Cassandra (column-family)
Relationship traversal	Neo4j (graph)
High write throughput, IoT	Cassandra or HBase
Full-text search	Elasticsearch

Polyglot Persistence#

Production systems rarely use one database. A common e-commerce stack:

Each database does what it's best at. The application coordinates between them.

Summary#

Document stores (MongoDB): flexible schemas, embedded data, rich queries. Best for content, catalogs, user data.
Key-value stores (Redis): in-memory speed, rich data structures. Best for caching, sessions, leaderboards.
Column-family (Cassandra): massive write throughput, time-series, denormalised for known query patterns.
Graph (Neo4j): when relationships between entities matter as much as the entities themselves.
CAP theorem: partition tolerance is non-negotiable in distributed systems. Choose CP or AP based on whether you need strong consistency or high availability.
NoSQL is not a replacement for SQL — it solves different problems. The best production systems use both.

NoSQL Databases — Document, Key-Value, Column & Graph

Why NoSQL? Limitations of Relational Databases#

The Rigid Schema Problem#

Relational databases require a fixed schema defined up front. Adding a column to a table with 500 million rows is a multi-hour (or multi-day) operation with locks and table rewrites.

sql

Loading editor…

In a document database, each document can have different fields. No migration needed.

Horizontal Scaling#

NoSQL systems were designed to scale horizontally — add more commodity machines. Each shard holds a slice of data. 10x the load? Add more nodes.

JSON Data Everywhere#

CAP Theorem: Pick Two#

The CAP theorem states a distributed system can guarantee at most two of three properties simultaneously:

Consistency (C): Every read receives the most recent write (or an error)
Availability (A): Every request receives a response (not necessarily the most recent data)
Partition Tolerance (P): The system continues operating despite network partitions

Network partitions will happen. So real systems choose between CP (sacrifice some availability) or AP (sacrifice strict consistency).

CAP in Practice#

System	CAP Choice	Trade-off
PostgreSQL	CA	Not designed for distributed partitions
MongoDB	CP or AP	Configurable via write concern
Cassandra	AP	Eventually consistent by default
HBase	CP	Strong consistency, less available under partition
Redis	CP (primary)	Primary is authoritative
DynamoDB	AP or CP	Eventual or strong consistency per request
Zookeeper	CP	Used for coordination, not storage

Document Stores — MongoDB#

A document store saves data as self-describing documents — usually JSON or BSON. No joins required; related data is embedded in the document.

json

Loading editor…

Core Operations#

Insert:

javascript

Loading editor…

Find (Query):

javascript

Loading editor…

Update:

javascript

Loading editor…

Aggregation Pipeline:

javascript

Loading editor…

When to Use MongoDB#

Content management systems with varied article metadata
Product catalogs (different products have different attributes)
User profiles where each user has different optional fields
Mobile app backends with evolving schemas
Real-time analytics where you're storing raw events

When NOT to Use MongoDB#

Complex multi-entity transactions (e.g., banking transfers)
Heavy aggregation across many relationships (SQL wins here)
When you need strong ACID guarantees across multiple collections

Key-Value Stores — Redis#

Redis stores data as key-value pairs entirely in memory (with optional persistence). It supports rich data structures beyond simple strings.

Data Types#

Strings (GET/SET):

bash

Loading editor…

Lists (LPUSH/LRANGE):

bash

Loading editor…

Sorted Sets (ZADD/ZRANGE) — Leaderboards:

bash

Loading editor…

Hashes:

bash

Loading editor…

Sets:

bash

Loading editor…

Pub/Sub basics:

bash

Loading editor…

Use Cases#

Use Case	Redis Feature	Pattern
Session storage	Strings + TTL	`SET session:<token> <data> EX 3600`
Rate limiting	INCR + TTL	count requests per minute per IP
Caching DB results	Strings + TTL	cache SQL query results
Leaderboards	Sorted Sets	ZADD/ZREVRANGE
Job queues	Lists	LPUSH to enqueue, RPOP to dequeue
Real-time chat	Pub/Sub	PUBLISH/SUBSCRIBE
Unique counts	HyperLogLog	PFADD/PFCOUNT (approximate)

Column-Family Stores — Cassandra / HBase#

Column-family stores organise data by rows and columns, but unlike relational databases, each row can have a different set of columns. They are optimised for writes and wide table scans.

Data Model#

text

Loading editor…

The partition key determines which node holds the data. The clustering key determines the sort order within a partition.

CQL Queries (Cassandra Query Language)#

sql

Loading editor…

Key Characteristic: Denormalise for Queries#

In Cassandra, you design tables around your queries, not your data model. If you need to query by user AND by event_type, you create two separate tables — each optimised for one access pattern.

Use Cases#

Time-series data: sensor readings, stock prices, application metrics
IoT at scale: millions of devices writing events per second
User activity logs: every click, impression, or event across billions of users
Messaging systems: storing chat message history at scale

Graph Databases — Neo4j#

Graph databases store data as nodes (entities), edges (relationships), and properties on both. They shine when relationships between data are as important as the data itself.

Cypher Query Language#

Create nodes and relationships:

cypher

Loading editor…

Query — find friends of friends:

cypher

Loading editor…

Recommendation — "People who bought X also bought":

cypher

Loading editor…

Fraud detection — find circular transaction patterns:

cypher

Loading editor…

Use Cases#

Social networks: follows, likes, friend-of-friend queries
Fraud detection: circular money flows, unusual relationship patterns
Recommendation engines: collaborative filtering, content-based recommendations
Knowledge graphs: entities and their relationships in large ontologies
Access control: role hierarchies, permission inheritance

SQL vs NoSQL Decision Guide#

Factor	Choose SQL	Choose NoSQL
Schema	Fixed, well-defined	Evolving, flexible
Relationships	Complex, many joins	Few or embedded
Transactions	Multi-table ACID required	Single-entity ops fine
Consistency	Strong required	Eventual acceptable
Scale	Vertical scaling adequate	Horizontal scale needed
Query pattern	Ad hoc, flexible queries	Known, repeated access patterns
Team familiarity	SQL expertise	NoSQL expertise
Data shape	Tabular rows	JSON, graphs, time-series

Choosing the Right NoSQL Model#

Data Shape / Access Pattern	Use
JSON objects with nested data	MongoDB (document)
Key lookups, caching, sessions	Redis (key-value)
Billions of time-ordered rows	Cassandra (column-family)
Relationship traversal	Neo4j (graph)
High write throughput, IoT	Cassandra or HBase
Full-text search	Elasticsearch

Polyglot Persistence#

Production systems rarely use one database. A common e-commerce stack:

Each database does what it's best at. The application coordinates between them.

Summary#

Document stores (MongoDB): flexible schemas, embedded data, rich queries. Best for content, catalogs, user data.
Key-value stores (Redis): in-memory speed, rich data structures. Best for caching, sessions, leaderboards.
Column-family (Cassandra): massive write throughput, time-series, denormalised for known query patterns.
Graph (Neo4j): when relationships between entities matter as much as the entities themselves.
CAP theorem: partition tolerance is non-negotiable in distributed systems. Choose CP or AP based on whether you need strong consistency or high availability.
NoSQL is not a replacement for SQL — it solves different problems. The best production systems use both.

NoSQL Databases — Document, Key-Value, Column & Graph Stores

NoSQL Databases — Document, Key-Value, Column & Graph

Why NoSQL? Limitations of Relational Databases#

The Rigid Schema Problem#

Horizontal Scaling#

JSON Data Everywhere#

CAP Theorem: Pick Two#

CAP in Practice#

Document Stores — MongoDB#

Core Operations#

When to Use MongoDB#

When NOT to Use MongoDB#

Key-Value Stores — Redis#

Data Types#

Use Cases#

Column-Family Stores — Cassandra / HBase#

Data Model#

CQL Queries (Cassandra Query Language)#

Key Characteristic: Denormalise for Queries#

Use Cases#

Graph Databases — Neo4j#

Cypher Query Language#

Use Cases#

SQL vs NoSQL Decision Guide#

Choosing the Right NoSQL Model#

Polyglot Persistence#

Summary#

NoSQL Databases — Document, Key-Value, Column & Graph Stores

NoSQL Databases — Document, Key-Value, Column & Graph

Why NoSQL? Limitations of Relational Databases#

The Rigid Schema Problem#

Horizontal Scaling#

JSON Data Everywhere#

CAP Theorem: Pick Two#

CAP in Practice#

Document Stores — MongoDB#

Core Operations#

When to Use MongoDB#

When NOT to Use MongoDB#

Key-Value Stores — Redis#

Data Types#

Use Cases#

Column-Family Stores — Cassandra / HBase#

Data Model#

CQL Queries (Cassandra Query Language)#

Key Characteristic: Denormalise for Queries#

Use Cases#

Graph Databases — Neo4j#

Cypher Query Language#

Use Cases#

SQL vs NoSQL Decision Guide#

Choosing the Right NoSQL Model#

Polyglot Persistence#

Summary#