Load Balancers & CDNs — Scaling Web Infrastructure

A single server has finite CPU, memory, and network capacity. Past a certain traffic volume, it cannot respond fast enough — or at all. Load balancers and content delivery networks (CDNs) are the two main tools for scaling beyond a single machine, and they operate at different layers of the stack.

Why You Need a Load Balancer#

Suppose your application runs on one server. The problems are:

Single point of failure — one hardware fault or deployment crash takes down the whole product.
Vertical scaling ceiling — you can only buy so much CPU/RAM for one machine, and it gets expensive fast.
No zero-downtime deploys — you can't restart the server without dropping connections.

A load balancer sits in front of a pool of backend servers and distributes incoming connections across them.

Now any single backend can be removed (for maintenance, upgrade, or failure) without users noticing — the load balancer stops sending traffic to it.

Layer 4 vs Layer 7 Load Balancing#

Layer 4 — Transport Layer#

The load balancer routes based on IP addresses and TCP/UDP ports. It never inspects the application payload.

text

Loading editor…

Characteristics:

Very fast — minimal processing per packet.
Cannot make routing decisions based on URL, HTTP headers, or cookies.
Works for any TCP or UDP protocol (HTTPS, DNS, SMTP, game servers).
Maintains TCP session state — a connection to one backend stays on that backend.

Use case: High-throughput, low-latency routing where you don't need to inspect content. AWS NLB is a managed L4 load balancer.

Layer 7 — Application Layer#

The load balancer terminates the client connection, reads the HTTP request, then opens a separate connection to a backend.

What L7 can do that L4 cannot:

Route /api/* to one set of servers and /static/* to another.
Add or strip HTTP headers (e.g., inject X-Real-IP).
Route based on Host header to different virtual hosts.
Sticky sessions via cookie inspection.
Rewrite URLs, redirect HTTP to HTTPS.
Terminate TLS and perform SSL offloading.
WAF (Web Application Firewall) inspection.

Cost: More CPU per request than L4. AWS ALB and NGINX are managed/software L7 load balancers.

L4 vs L7 Comparison#

Feature	L4	L7
Latency	Very low	Low (slightly higher)
Content-based routing	No	Yes
TLS termination	No (pass-through) or Yes	Yes
Protocol support	Any TCP/UDP	HTTP, HTTPS, WebSocket, gRPC
Session persistence	By IP	By cookie or URL
Visibility	IP, port, TCP flags	Full HTTP request/response

Load Balancing Algorithms#

Round Robin#

Requests are distributed to backends in turn: 1, 2, 3, 1, 2, 3, ...

text

Loading editor…

Simple and fair when all requests have similar cost and all backends are equal. Breaks down when requests have very different processing times or backends have different capacity.

Weighted Round Robin#

Each backend gets a weight. A backend with weight 3 receives 3x as many requests as one with weight 1.

text

Loading editor…

Use this when backends have different CPU/RAM (e.g., during a rolling upgrade with mixed instance sizes).

Least Connections#

New requests go to the backend with the fewest active connections.

text

Loading editor…

Best when requests have highly variable processing time (e.g., a mix of fast API calls and slow file uploads).

IP Hash (Session Affinity / Sticky Sessions)#

The client's IP address is hashed to always select the same backend.

text

Loading editor…

Ensures the same client always reaches the same server. Useful for applications with server-side session state (though you should prefer stateless designs or shared session stores).

Problem: If a backend is removed, all clients whose hash pointed to it are redistributed. If a backend is added, ~1/n of all clients get remapped.

Consistent Hashing#

Used in distributed systems (Cassandra, Redis Cluster, CDN edge nodes).

Backends are placed on a virtual ring. A request's hash is mapped to the ring and routed to the nearest backend clockwise.

text

Loading editor…

When a backend is added or removed, only the requests that were routed to it need to be remapped. On average that is 1/n of total requests — much less disruption than modulo hashing.

Health Checks#

A load balancer continuously checks that backends are alive and able to serve traffic.

Active Health Checks#

The load balancer proactively sends requests to backends:

text

Loading editor…

The health endpoint should check the application's dependencies (database connection, cache connection) rather than just returning 200 blindly.

Passive Health Checks#

The load balancer observes real responses as they flow through:

3 consecutive 5xx responses → mark backend as unhealthy.
Response time > 30 seconds → mark as slow/unhealthy.
TCP connection refused → immediately mark as down.

Passive checks detect failures faster for real traffic but cannot detect a failed backend before a real user's request is affected.

Session Persistence (Sticky Sessions)#

Some legacy applications store session data locally in memory (not in Redis or a database). If a user's second request goes to a different backend, their session is lost.

Sticky sessions bind a user to one backend for the duration of their session.

Cookie-based stickiness (L7):

text

Loading editor…

Trade-offs:

	Sticky Sessions	Shared Session Store
Implementation	LB config only	Requires Redis/DB change
Scaling	Uneven load distribution possible	Even distribution
Fault tolerance	User loses session if their backend fails	Backend failure is transparent
Complexity	Low	Higher (session store is another dependency)

Prefer stateless backends with a shared session store (Redis) over sticky sessions for any new application.

SSL Termination#

Encrypting and decrypting TLS is CPU-intensive. Load balancers can handle it centrally:

text

Loading editor…

Benefits:

Backends don't need TLS certificates or SSL libraries.
Certificate rotation happens in one place.
The load balancer can inspect HTTP headers for routing.

The connection between LB and backends is HTTP, but it's usually on a private network. If your compliance requirements demand encryption everywhere, use TLS passthrough (L4) or TLS re-encryption (L7 terminates client TLS, then establishes a new TLS connection to the backend).

CDN — Content Delivery Network#

A CDN is a globally distributed network of edge servers (PoPs — Points of Presence) that cache content close to users.

text

Loading editor…

Cache Hit vs Miss#

Cache hit: the edge has the content cached locally. Responds immediately. No origin server involved.
Cache miss: the edge doesn't have the content. Fetches from the origin, caches it with the specified TTL, then serves it.

Cache hit ratio is the primary CDN performance metric. A 95% hit ratio means 95% of requests never reach your origin server.

Pull vs Push CDNs#

Pull CDN (most common):

Edge fetches content from origin on the first request (cache miss).
Subsequent requests are served from cache.
Content expires after TTL and is re-fetched on next request.
Example: Cloudflare, CloudFront with default config.

Push CDN:

You explicitly upload content to CDN edge nodes.
Good for large files that you want pre-cached everywhere (game installers, software releases, video).
You control exactly what is on the edge — no cold-start latency.
Example: Akamai NetStorage, some CloudFront configurations.

What CDNs Cache#

Content Type	TTL Strategy	Notes
Static assets (JS, CSS, images)	Long (days/weeks)	Use cache-busting filenames (`app.abc123.js`)
HTML pages	Short (minutes) or no-cache	Dynamic content changes frequently
Video (HLS/DASH segments)	Medium (hours)	Each segment is a small fixed file
API responses	Short (seconds/minutes)	Only for cacheable, non-user-specific responses
Private/personalised content	Never cache on shared CDN	Use `Cache-Control: private`

Video Streaming: HLS and DASH#

Large video files are not served as a single blob. They are split into small segments (2–10 seconds each) and served adaptively:

text

Loading editor…

The player requests the manifest, picks an appropriate quality level based on available bandwidth, and fetches segments. If the network slows down, it switches to a lower-quality stream mid-playback.

Anycast#

Anycast is a routing technique where the same IP address is announced from multiple physical locations. BGP routes packets to the topographically nearest (lowest cost) location.

text

Loading editor…

This is completely transparent to the user — they just see 1.1.1.1.

Why Anycast Matters for DDoS Mitigation#

Distributed Denial of Service attacks generate massive traffic volumes aimed at one IP. With anycast:

text

Loading editor…

The attack is absorbed and distributed across the global network. Cloudflare's anycast network spans 100+ Tbps of capacity, making volumetric DDoS attacks impractical to sustain.

Popular Tools and Services#

Software Load Balancers#

NGINX — HTTP server, reverse proxy, and L7 load balancer.

nginx

Loading editor…

HAProxy — dedicated TCP and HTTP load balancer, known for high performance and detailed statistics.

text

Loading editor…

Managed Cloud Load Balancers#

Service	Layer	Notes
AWS ALB (Application LB)	L7	HTTP/HTTPS/gRPC, host/path routing, WAF integration
AWS NLB (Network LB)	L4	TCP/UDP, extreme throughput, static IP, TLS passthrough
AWS CLB (Classic LB)	L4/L7	Legacy, prefer ALB/NLB
GCP Cloud Load Balancing	L4 + L7	Global anycast-based
Azure Load Balancer	L4	Basic TCP/UDP
Azure Application Gateway	L7	HTTP/HTTPS + WAF

CDN Providers#

Provider	Strength
Cloudflare	Largest anycast network, DDoS mitigation, DNS, Zero Trust
AWS CloudFront	Deep AWS integration, Lambda@Edge for compute at edge
Akamai	Oldest CDN, strong for enterprise and media streaming
Fastly	Varnish-based, instant cache purge, edge compute
Bunny.net	Cost-effective, good for video and static assets

Putting It Together#

A typical production architecture for a web application:

Static assets (JS bundles, images, fonts) are served entirely from the CDN. Dynamic API requests pass through ALB to compute instances. The database and cache are never exposed to the load balancer directly.

This architecture handles traffic spikes (auto-scaling), single-server failures (LB health checks), and global users (CDN edge caching) — the three core scaling problems.

Load Balancers & CDNs — Scaling Web Infrastructure

Why You Need a Load Balancer#

Suppose your application runs on one server. The problems are:

Single point of failure — one hardware fault or deployment crash takes down the whole product.
Vertical scaling ceiling — you can only buy so much CPU/RAM for one machine, and it gets expensive fast.
No zero-downtime deploys — you can't restart the server without dropping connections.

A load balancer sits in front of a pool of backend servers and distributes incoming connections across them.

Now any single backend can be removed (for maintenance, upgrade, or failure) without users noticing — the load balancer stops sending traffic to it.

Layer 4 vs Layer 7 Load Balancing#

Layer 4 — Transport Layer#

The load balancer routes based on IP addresses and TCP/UDP ports. It never inspects the application payload.

text

Loading editor…

Characteristics:

Very fast — minimal processing per packet.
Cannot make routing decisions based on URL, HTTP headers, or cookies.
Works for any TCP or UDP protocol (HTTPS, DNS, SMTP, game servers).
Maintains TCP session state — a connection to one backend stays on that backend.

Use case: High-throughput, low-latency routing where you don't need to inspect content. AWS NLB is a managed L4 load balancer.

Layer 7 — Application Layer#

The load balancer terminates the client connection, reads the HTTP request, then opens a separate connection to a backend.

What L7 can do that L4 cannot:

Route /api/* to one set of servers and /static/* to another.
Add or strip HTTP headers (e.g., inject X-Real-IP).
Route based on Host header to different virtual hosts.
Sticky sessions via cookie inspection.
Rewrite URLs, redirect HTTP to HTTPS.
Terminate TLS and perform SSL offloading.
WAF (Web Application Firewall) inspection.

Cost: More CPU per request than L4. AWS ALB and NGINX are managed/software L7 load balancers.

L4 vs L7 Comparison#

Feature	L4	L7
Latency	Very low	Low (slightly higher)
Content-based routing	No	Yes
TLS termination	No (pass-through) or Yes	Yes
Protocol support	Any TCP/UDP	HTTP, HTTPS, WebSocket, gRPC
Session persistence	By IP	By cookie or URL
Visibility	IP, port, TCP flags	Full HTTP request/response

Load Balancing Algorithms#

Round Robin#

Requests are distributed to backends in turn: 1, 2, 3, 1, 2, 3, ...

text

Loading editor…

Simple and fair when all requests have similar cost and all backends are equal. Breaks down when requests have very different processing times or backends have different capacity.

Weighted Round Robin#

Each backend gets a weight. A backend with weight 3 receives 3x as many requests as one with weight 1.

text

Loading editor…

Use this when backends have different CPU/RAM (e.g., during a rolling upgrade with mixed instance sizes).

Least Connections#

New requests go to the backend with the fewest active connections.

text

Loading editor…

Best when requests have highly variable processing time (e.g., a mix of fast API calls and slow file uploads).

IP Hash (Session Affinity / Sticky Sessions)#

The client's IP address is hashed to always select the same backend.

text

Loading editor…

Ensures the same client always reaches the same server. Useful for applications with server-side session state (though you should prefer stateless designs or shared session stores).

Problem: If a backend is removed, all clients whose hash pointed to it are redistributed. If a backend is added, ~1/n of all clients get remapped.

Consistent Hashing#

Used in distributed systems (Cassandra, Redis Cluster, CDN edge nodes).

Backends are placed on a virtual ring. A request's hash is mapped to the ring and routed to the nearest backend clockwise.

text

Loading editor…

When a backend is added or removed, only the requests that were routed to it need to be remapped. On average that is 1/n of total requests — much less disruption than modulo hashing.

Health Checks#

A load balancer continuously checks that backends are alive and able to serve traffic.

Active Health Checks#

The load balancer proactively sends requests to backends:

text

Loading editor…

The health endpoint should check the application's dependencies (database connection, cache connection) rather than just returning 200 blindly.

Passive Health Checks#

The load balancer observes real responses as they flow through:

3 consecutive 5xx responses → mark backend as unhealthy.
Response time > 30 seconds → mark as slow/unhealthy.
TCP connection refused → immediately mark as down.

Passive checks detect failures faster for real traffic but cannot detect a failed backend before a real user's request is affected.

Session Persistence (Sticky Sessions)#

Some legacy applications store session data locally in memory (not in Redis or a database). If a user's second request goes to a different backend, their session is lost.

Sticky sessions bind a user to one backend for the duration of their session.

Cookie-based stickiness (L7):

text

Loading editor…

Trade-offs:

	Sticky Sessions	Shared Session Store
Implementation	LB config only	Requires Redis/DB change
Scaling	Uneven load distribution possible	Even distribution
Fault tolerance	User loses session if their backend fails	Backend failure is transparent
Complexity	Low	Higher (session store is another dependency)

Prefer stateless backends with a shared session store (Redis) over sticky sessions for any new application.

SSL Termination#

Encrypting and decrypting TLS is CPU-intensive. Load balancers can handle it centrally:

text

Loading editor…

Benefits:

Backends don't need TLS certificates or SSL libraries.
Certificate rotation happens in one place.
The load balancer can inspect HTTP headers for routing.

CDN — Content Delivery Network#

A CDN is a globally distributed network of edge servers (PoPs — Points of Presence) that cache content close to users.

text

Loading editor…

Cache Hit vs Miss#

Cache hit: the edge has the content cached locally. Responds immediately. No origin server involved.
Cache miss: the edge doesn't have the content. Fetches from the origin, caches it with the specified TTL, then serves it.

Cache hit ratio is the primary CDN performance metric. A 95% hit ratio means 95% of requests never reach your origin server.

Pull vs Push CDNs#

Pull CDN (most common):

Edge fetches content from origin on the first request (cache miss).
Subsequent requests are served from cache.
Content expires after TTL and is re-fetched on next request.
Example: Cloudflare, CloudFront with default config.

Push CDN:

You explicitly upload content to CDN edge nodes.
Good for large files that you want pre-cached everywhere (game installers, software releases, video).
You control exactly what is on the edge — no cold-start latency.
Example: Akamai NetStorage, some CloudFront configurations.

What CDNs Cache#

Content Type	TTL Strategy	Notes
Static assets (JS, CSS, images)	Long (days/weeks)	Use cache-busting filenames (`app.abc123.js`)
HTML pages	Short (minutes) or no-cache	Dynamic content changes frequently
Video (HLS/DASH segments)	Medium (hours)	Each segment is a small fixed file
API responses	Short (seconds/minutes)	Only for cacheable, non-user-specific responses
Private/personalised content	Never cache on shared CDN	Use `Cache-Control: private`

Video Streaming: HLS and DASH#

Large video files are not served as a single blob. They are split into small segments (2–10 seconds each) and served adaptively:

text

Loading editor…

The player requests the manifest, picks an appropriate quality level based on available bandwidth, and fetches segments. If the network slows down, it switches to a lower-quality stream mid-playback.

Anycast#

Anycast is a routing technique where the same IP address is announced from multiple physical locations. BGP routes packets to the topographically nearest (lowest cost) location.

text

Loading editor…

This is completely transparent to the user — they just see 1.1.1.1.

Why Anycast Matters for DDoS Mitigation#

Distributed Denial of Service attacks generate massive traffic volumes aimed at one IP. With anycast:

text

Loading editor…

The attack is absorbed and distributed across the global network. Cloudflare's anycast network spans 100+ Tbps of capacity, making volumetric DDoS attacks impractical to sustain.

Popular Tools and Services#

Software Load Balancers#

NGINX — HTTP server, reverse proxy, and L7 load balancer.

nginx

Loading editor…

HAProxy — dedicated TCP and HTTP load balancer, known for high performance and detailed statistics.

text

Loading editor…

Managed Cloud Load Balancers#

Service	Layer	Notes
AWS ALB (Application LB)	L7	HTTP/HTTPS/gRPC, host/path routing, WAF integration
AWS NLB (Network LB)	L4	TCP/UDP, extreme throughput, static IP, TLS passthrough
AWS CLB (Classic LB)	L4/L7	Legacy, prefer ALB/NLB
GCP Cloud Load Balancing	L4 + L7	Global anycast-based
Azure Load Balancer	L4	Basic TCP/UDP
Azure Application Gateway	L7	HTTP/HTTPS + WAF

CDN Providers#

Provider	Strength
Cloudflare	Largest anycast network, DDoS mitigation, DNS, Zero Trust
AWS CloudFront	Deep AWS integration, Lambda@Edge for compute at edge
Akamai	Oldest CDN, strong for enterprise and media streaming
Fastly	Varnish-based, instant cache purge, edge compute
Bunny.net	Cost-effective, good for video and static assets

Putting It Together#

A typical production architecture for a web application:

This architecture handles traffic spikes (auto-scaling), single-server failures (LB health checks), and global users (CDN edge caching) — the three core scaling problems.

Load Balancers & CDNs — Scaling Traffic at the Edge

Load Balancers & CDNs — Scaling Web Infrastructure

Why You Need a Load Balancer#

Layer 4 vs Layer 7 Load Balancing#

Layer 4 — Transport Layer#

Layer 7 — Application Layer#

L4 vs L7 Comparison#

Load Balancing Algorithms#

Round Robin#

Weighted Round Robin#

Least Connections#

IP Hash (Session Affinity / Sticky Sessions)#

Consistent Hashing#

Health Checks#

Active Health Checks#

Passive Health Checks#

Session Persistence (Sticky Sessions)#

SSL Termination#

CDN — Content Delivery Network#

Cache Hit vs Miss#

Pull vs Push CDNs#

What CDNs Cache#

Video Streaming: HLS and DASH#

Anycast#

Why Anycast Matters for DDoS Mitigation#

Popular Tools and Services#

Software Load Balancers#

Managed Cloud Load Balancers#

CDN Providers#

Putting It Together#

Load Balancers & CDNs — Scaling Traffic at the Edge

Load Balancers & CDNs — Scaling Web Infrastructure

Why You Need a Load Balancer#

Layer 4 vs Layer 7 Load Balancing#

Layer 4 — Transport Layer#

Layer 7 — Application Layer#

L4 vs L7 Comparison#

Load Balancing Algorithms#

Round Robin#

Weighted Round Robin#

Least Connections#

IP Hash (Session Affinity / Sticky Sessions)#

Consistent Hashing#

Health Checks#

Active Health Checks#

Passive Health Checks#

Session Persistence (Sticky Sessions)#

SSL Termination#

CDN — Content Delivery Network#

Cache Hit vs Miss#

Pull vs Push CDNs#

What CDNs Cache#

Video Streaming: HLS and DASH#

Anycast#

Why Anycast Matters for DDoS Mitigation#

Popular Tools and Services#

Software Load Balancers#

Managed Cloud Load Balancers#

CDN Providers#

Putting It Together#