NoSQL Databases — Document, Key-Value, Column & Graph

Relational databases ruled for decades. But as web scale grew — billions of users, flexible JSON payloads, globally distributed writes — relational systems hit hard limits. NoSQL doesn't replace SQL; it solves different problems. Understanding which NoSQL model fits which problem is the skill.

Why NoSQL? Limitations of Relational Databases

The Rigid Schema Problem

Relational databases require a fixed schema defined up front. Adding a column to a table with 500 million rows is a multi-hour (or multi-day) operation with locks and table rewrites.

sql
Loading...

In a document database, each document can have different fields. No migration needed.

Horizontal Scaling

RDBMS scale vertically — bigger CPU, more RAM, faster disks. This works until it doesn't. A single MySQL server tops out around a few TB of storage and hundreds of thousands of QPS under typical workloads.

NoSQL systems were designed to scale horizontally — add more commodity machines. Each shard holds a slice of data. 10x the load? Add more nodes.

Relational (vertical)BIG SERVER32 cores512 GB RAMvsNoSQL (horizontal)Node 1Node 2N+1add more nodes as needed

JSON Data Everywhere

Modern APIs return JSON. Storing nested JSON in relational tables requires either complex joins across multiple tables or using a JSONB column (which is essentially a document store bolted onto Postgres).


CAP Theorem: Pick Two

The CAP theorem states a distributed system can guarantee at most two of three properties simultaneously:

  • Consistency (C): Every read receives the most recent write (or an error)
  • Availability (A): Every request receives a response (not necessarily the most recent data)
  • Partition Tolerance (P): The system continues operating despite network partitions

Network partitions will happen. So real systems choose between CP (sacrifice some availability) or AP (sacrifice strict consistency).

C(Consistency)APCassandraDynamoDBCPHBaseZookeeperCATraditional RDBMS(no partition tolerance)AvailabilityConsistencyPartition Tolerance

CAP in Practice

SystemCAP ChoiceTrade-off
PostgreSQLCANot designed for distributed partitions
MongoDBCP or APConfigurable via write concern
CassandraAPEventually consistent by default
HBaseCPStrong consistency, less available under partition
RedisCP (primary)Primary is authoritative
DynamoDBAP or CPEventual or strong consistency per request
ZookeeperCPUsed for coordination, not storage

Document Stores — MongoDB

A document store saves data as self-describing documents — usually JSON or BSON. No joins required; related data is embedded in the document.

json
Loading...

Core Operations

Insert:

javascript
Loading...

Find (Query):

javascript
Loading...

Update:

javascript
Loading...

Aggregation Pipeline:

javascript
Loading...

When to Use MongoDB

  • Content management systems with varied article metadata
  • Product catalogs (different products have different attributes)
  • User profiles where each user has different optional fields
  • Mobile app backends with evolving schemas
  • Real-time analytics where you're storing raw events

When NOT to Use MongoDB

  • Complex multi-entity transactions (e.g., banking transfers)
  • Heavy aggregation across many relationships (SQL wins here)
  • When you need strong ACID guarantees across multiple collections

Key-Value Stores — Redis

Redis stores data as key-value pairs entirely in memory (with optional persistence). It supports rich data structures beyond simple strings.

Data Types

Strings (GET/SET):

bash
Loading...

Lists (LPUSH/LRANGE):

bash
Loading...

Sorted Sets (ZADD/ZRANGE) — Leaderboards:

bash
Loading...

Hashes:

bash
Loading...

Sets:

bash
Loading...

Pub/Sub basics:

bash
Loading...

Use Cases

Use CaseRedis FeaturePattern
Session storageStrings + TTLSET session:<token> <data> EX 3600
Rate limitingINCR + TTLcount requests per minute per IP
Caching DB resultsStrings + TTLcache SQL query results
LeaderboardsSorted SetsZADD/ZREVRANGE
Job queuesListsLPUSH to enqueue, RPOP to dequeue
Real-time chatPub/SubPUBLISH/SUBSCRIBE
Unique countsHyperLogLogPFADD/PFCOUNT (approximate)

Column-Family Stores — Cassandra / HBase

Column-family stores organise data by rows and columns, but unlike relational databases, each row can have a different set of columns. They are optimised for writes and wide table scans.

Data Model

text
Loading...

The partition key determines which node holds the data. The clustering key determines the sort order within a partition.

CQL Queries (Cassandra Query Language)

sql
Loading...

Key Characteristic: Denormalise for Queries

In Cassandra, you design tables around your queries, not your data model. If you need to query by user AND by event_type, you create two separate tables — each optimised for one access pattern.

Use Cases

  • Time-series data: sensor readings, stock prices, application metrics
  • IoT at scale: millions of devices writing events per second
  • User activity logs: every click, impression, or event across billions of users
  • Messaging systems: storing chat message history at scale

Graph Databases — Neo4j

Graph databases store data as nodes (entities), edges (relationships), and properties on both. They shine when relationships between data are as important as the data itself.

AliceBobCarolPost"Hello!"FOLLOWSLIKESPOSTEDMENTIONS

Cypher Query Language

Create nodes and relationships:

cypher
Loading...

Query — find friends of friends:

cypher
Loading...

Recommendation — "People who bought X also bought":

cypher
Loading...

Fraud detection — find circular transaction patterns:

cypher
Loading...

Use Cases

  • Social networks: follows, likes, friend-of-friend queries
  • Fraud detection: circular money flows, unusual relationship patterns
  • Recommendation engines: collaborative filtering, content-based recommendations
  • Knowledge graphs: entities and their relationships in large ontologies
  • Access control: role hierarchies, permission inheritance

SQL vs NoSQL Decision Guide

FactorChoose SQLChoose NoSQL
SchemaFixed, well-definedEvolving, flexible
RelationshipsComplex, many joinsFew or embedded
TransactionsMulti-table ACID requiredSingle-entity ops fine
ConsistencyStrong requiredEventual acceptable
ScaleVertical scaling adequateHorizontal scale needed
Query patternAd hoc, flexible queriesKnown, repeated access patterns
Team familiaritySQL expertiseNoSQL expertise
Data shapeTabular rowsJSON, graphs, time-series

Choosing the Right NoSQL Model

Data Shape / Access PatternUse
JSON objects with nested dataMongoDB (document)
Key lookups, caching, sessionsRedis (key-value)
Billions of time-ordered rowsCassandra (column-family)
Relationship traversalNeo4j (graph)
High write throughput, IoTCassandra or HBase
Full-text searchElasticsearch

Polyglot Persistence

Production systems rarely use one database. A common e-commerce stack:

Web AppMySQLOrders · UsersPaymentsMongoDBCatalog · ReviewsRedisSessions · CartCacheNeo4jRecommendationsSocial Graph

Each database does what it's best at. The application coordinates between them.


Summary

  • Document stores (MongoDB): flexible schemas, embedded data, rich queries. Best for content, catalogs, user data.
  • Key-value stores (Redis): in-memory speed, rich data structures. Best for caching, sessions, leaderboards.
  • Column-family (Cassandra): massive write throughput, time-series, denormalised for known query patterns.
  • Graph (Neo4j): when relationships between entities matter as much as the entities themselves.
  • CAP theorem: partition tolerance is non-negotiable in distributed systems. Choose CP or AP based on whether you need strong consistency or high availability.
  • NoSQL is not a replacement for SQL — it solves different problems. The best production systems use both.