NoSQL Databases — Document, Key-Value, Column & Graph
Relational databases ruled for decades. But as web scale grew — billions of users, flexible JSON payloads, globally distributed writes — relational systems hit hard limits. NoSQL doesn't replace SQL; it solves different problems. Understanding which NoSQL model fits which problem is the skill.
Why NoSQL? Limitations of Relational Databases
The Rigid Schema Problem
Relational databases require a fixed schema defined up front. Adding a column to a table with 500 million rows is a multi-hour (or multi-day) operation with locks and table rewrites.
In a document database, each document can have different fields. No migration needed.
Horizontal Scaling
RDBMS scale vertically — bigger CPU, more RAM, faster disks. This works until it doesn't. A single MySQL server tops out around a few TB of storage and hundreds of thousands of QPS under typical workloads.
NoSQL systems were designed to scale horizontally — add more commodity machines. Each shard holds a slice of data. 10x the load? Add more nodes.
JSON Data Everywhere
Modern APIs return JSON. Storing nested JSON in relational tables requires either complex joins across multiple tables or using a JSONB column (which is essentially a document store bolted onto Postgres).
CAP Theorem: Pick Two
The CAP theorem states a distributed system can guarantee at most two of three properties simultaneously:
- Consistency (C): Every read receives the most recent write (or an error)
- Availability (A): Every request receives a response (not necessarily the most recent data)
- Partition Tolerance (P): The system continues operating despite network partitions
Network partitions will happen. So real systems choose between CP (sacrifice some availability) or AP (sacrifice strict consistency).
CAP in Practice
| System | CAP Choice | Trade-off |
|---|---|---|
| PostgreSQL | CA | Not designed for distributed partitions |
| MongoDB | CP or AP | Configurable via write concern |
| Cassandra | AP | Eventually consistent by default |
| HBase | CP | Strong consistency, less available under partition |
| Redis | CP (primary) | Primary is authoritative |
| DynamoDB | AP or CP | Eventual or strong consistency per request |
| Zookeeper | CP | Used for coordination, not storage |
Document Stores — MongoDB
A document store saves data as self-describing documents — usually JSON or BSON. No joins required; related data is embedded in the document.
Core Operations
Insert:
Find (Query):
Update:
Aggregation Pipeline:
When to Use MongoDB
- Content management systems with varied article metadata
- Product catalogs (different products have different attributes)
- User profiles where each user has different optional fields
- Mobile app backends with evolving schemas
- Real-time analytics where you're storing raw events
When NOT to Use MongoDB
- Complex multi-entity transactions (e.g., banking transfers)
- Heavy aggregation across many relationships (SQL wins here)
- When you need strong ACID guarantees across multiple collections
Key-Value Stores — Redis
Redis stores data as key-value pairs entirely in memory (with optional persistence). It supports rich data structures beyond simple strings.
Data Types
Strings (GET/SET):
Lists (LPUSH/LRANGE):
Sorted Sets (ZADD/ZRANGE) — Leaderboards:
Hashes:
Sets:
Pub/Sub basics:
Use Cases
| Use Case | Redis Feature | Pattern |
|---|---|---|
| Session storage | Strings + TTL | SET session:<token> <data> EX 3600 |
| Rate limiting | INCR + TTL | count requests per minute per IP |
| Caching DB results | Strings + TTL | cache SQL query results |
| Leaderboards | Sorted Sets | ZADD/ZREVRANGE |
| Job queues | Lists | LPUSH to enqueue, RPOP to dequeue |
| Real-time chat | Pub/Sub | PUBLISH/SUBSCRIBE |
| Unique counts | HyperLogLog | PFADD/PFCOUNT (approximate) |
Column-Family Stores — Cassandra / HBase
Column-family stores organise data by rows and columns, but unlike relational databases, each row can have a different set of columns. They are optimised for writes and wide table scans.
Data Model
The partition key determines which node holds the data. The clustering key determines the sort order within a partition.
CQL Queries (Cassandra Query Language)
Key Characteristic: Denormalise for Queries
In Cassandra, you design tables around your queries, not your data model. If you need to query by user AND by event_type, you create two separate tables — each optimised for one access pattern.
Use Cases
- Time-series data: sensor readings, stock prices, application metrics
- IoT at scale: millions of devices writing events per second
- User activity logs: every click, impression, or event across billions of users
- Messaging systems: storing chat message history at scale
Graph Databases — Neo4j
Graph databases store data as nodes (entities), edges (relationships), and properties on both. They shine when relationships between data are as important as the data itself.
Cypher Query Language
Create nodes and relationships:
Query — find friends of friends:
Recommendation — "People who bought X also bought":
Fraud detection — find circular transaction patterns:
Use Cases
- Social networks: follows, likes, friend-of-friend queries
- Fraud detection: circular money flows, unusual relationship patterns
- Recommendation engines: collaborative filtering, content-based recommendations
- Knowledge graphs: entities and their relationships in large ontologies
- Access control: role hierarchies, permission inheritance
SQL vs NoSQL Decision Guide
| Factor | Choose SQL | Choose NoSQL |
|---|---|---|
| Schema | Fixed, well-defined | Evolving, flexible |
| Relationships | Complex, many joins | Few or embedded |
| Transactions | Multi-table ACID required | Single-entity ops fine |
| Consistency | Strong required | Eventual acceptable |
| Scale | Vertical scaling adequate | Horizontal scale needed |
| Query pattern | Ad hoc, flexible queries | Known, repeated access patterns |
| Team familiarity | SQL expertise | NoSQL expertise |
| Data shape | Tabular rows | JSON, graphs, time-series |
Choosing the Right NoSQL Model
| Data Shape / Access Pattern | Use |
|---|---|
| JSON objects with nested data | MongoDB (document) |
| Key lookups, caching, sessions | Redis (key-value) |
| Billions of time-ordered rows | Cassandra (column-family) |
| Relationship traversal | Neo4j (graph) |
| High write throughput, IoT | Cassandra or HBase |
| Full-text search | Elasticsearch |
Polyglot Persistence
Production systems rarely use one database. A common e-commerce stack:
Each database does what it's best at. The application coordinates between them.
Summary
- Document stores (MongoDB): flexible schemas, embedded data, rich queries. Best for content, catalogs, user data.
- Key-value stores (Redis): in-memory speed, rich data structures. Best for caching, sessions, leaderboards.
- Column-family (Cassandra): massive write throughput, time-series, denormalised for known query patterns.
- Graph (Neo4j): when relationships between entities matter as much as the entities themselves.
- CAP theorem: partition tolerance is non-negotiable in distributed systems. Choose CP or AP based on whether you need strong consistency or high availability.
- NoSQL is not a replacement for SQL — it solves different problems. The best production systems use both.