Load Balancers & CDNs — Scaling Web Infrastructure

A single server has finite CPU, memory, and network capacity. Past a certain traffic volume, it cannot respond fast enough — or at all. Load balancers and content delivery networks (CDNs) are the two main tools for scaling beyond a single machine, and they operate at different layers of the stack.


Why You Need a Load Balancer

Suppose your application runs on one server. The problems are:

  • Single point of failure — one hardware fault or deployment crash takes down the whole product.
  • Vertical scaling ceiling — you can only buy so much CPU/RAM for one machine, and it gets expensive fast.
  • No zero-downtime deploys — you can't restart the server without dropping connections.

A load balancer sits in front of a pool of backend servers and distributes incoming connections across them.

ClientLoad BalancerBackend 1Backend 2Backend 3

Now any single backend can be removed (for maintenance, upgrade, or failure) without users noticing — the load balancer stops sending traffic to it.


Layer 4 vs Layer 7 Load Balancing

Layer 4 — Transport Layer

The load balancer routes based on IP addresses and TCP/UDP ports. It never inspects the application payload.

text
Loading...

Characteristics:

  • Very fast — minimal processing per packet.
  • Cannot make routing decisions based on URL, HTTP headers, or cookies.
  • Works for any TCP or UDP protocol (HTTPS, DNS, SMTP, game servers).
  • Maintains TCP session state — a connection to one backend stays on that backend.

Use case: High-throughput, low-latency routing where you don't need to inspect content. AWS NLB is a managed L4 load balancer.

Layer 7 — Application Layer

The load balancer terminates the client connection, reads the HTTP request, then opens a separate connection to a backend.

ClientLoad Balancer (L7)terminates TLS, reads HTTPBackend A/api/users/*Backend B/api/orders/*HTTPS

What L7 can do that L4 cannot:

  • Route /api/* to one set of servers and /static/* to another.
  • Add or strip HTTP headers (e.g., inject X-Real-IP).
  • Route based on Host header to different virtual hosts.
  • Sticky sessions via cookie inspection.
  • Rewrite URLs, redirect HTTP to HTTPS.
  • Terminate TLS and perform SSL offloading.
  • WAF (Web Application Firewall) inspection.

Cost: More CPU per request than L4. AWS ALB and NGINX are managed/software L7 load balancers.

L4 vs L7 Comparison

FeatureL4L7
LatencyVery lowLow (slightly higher)
Content-based routingNoYes
TLS terminationNo (pass-through) or YesYes
Protocol supportAny TCP/UDPHTTP, HTTPS, WebSocket, gRPC
Session persistenceBy IPBy cookie or URL
VisibilityIP, port, TCP flagsFull HTTP request/response

Load Balancing Algorithms

Round Robin

Requests are distributed to backends in turn: 1, 2, 3, 1, 2, 3, ...

text
Loading...

Simple and fair when all requests have similar cost and all backends are equal. Breaks down when requests have very different processing times or backends have different capacity.

Weighted Round Robin

Each backend gets a weight. A backend with weight 3 receives 3x as many requests as one with weight 1.

text
Loading...

Use this when backends have different CPU/RAM (e.g., during a rolling upgrade with mixed instance sizes).

Least Connections

New requests go to the backend with the fewest active connections.

text
Loading...

Best when requests have highly variable processing time (e.g., a mix of fast API calls and slow file uploads).

IP Hash (Session Affinity / Sticky Sessions)

The client's IP address is hashed to always select the same backend.

text
Loading...

Ensures the same client always reaches the same server. Useful for applications with server-side session state (though you should prefer stateless designs or shared session stores).

Problem: If a backend is removed, all clients whose hash pointed to it are redistributed. If a backend is added, ~1/n of all clients get remapped.

Consistent Hashing

Used in distributed systems (Cassandra, Redis Cluster, CDN edge nodes).

Backends are placed on a virtual ring. A request's hash is mapped to the ring and routed to the nearest backend clockwise.

text
Loading...

When a backend is added or removed, only the requests that were routed to it need to be remapped. On average that is 1/n of total requests — much less disruption than modulo hashing.


Health Checks

A load balancer continuously checks that backends are alive and able to serve traffic.

Active Health Checks

The load balancer proactively sends requests to backends:

text
Loading...

The health endpoint should check the application's dependencies (database connection, cache connection) rather than just returning 200 blindly.

Passive Health Checks

The load balancer observes real responses as they flow through:

  • 3 consecutive 5xx responses → mark backend as unhealthy.
  • Response time > 30 seconds → mark as slow/unhealthy.
  • TCP connection refused → immediately mark as down.

Passive checks detect failures faster for real traffic but cannot detect a failed backend before a real user's request is affected.


Session Persistence (Sticky Sessions)

Some legacy applications store session data locally in memory (not in Redis or a database). If a user's second request goes to a different backend, their session is lost.

Sticky sessions bind a user to one backend for the duration of their session.

Cookie-based stickiness (L7):

text
Loading...

Trade-offs:

Sticky SessionsShared Session Store
ImplementationLB config onlyRequires Redis/DB change
ScalingUneven load distribution possibleEven distribution
Fault toleranceUser loses session if their backend failsBackend failure is transparent
ComplexityLowHigher (session store is another dependency)

Prefer stateless backends with a shared session store (Redis) over sticky sessions for any new application.


SSL Termination

Encrypting and decrypting TLS is CPU-intensive. Load balancers can handle it centrally:

text
Loading...

Benefits:

  • Backends don't need TLS certificates or SSL libraries.
  • Certificate rotation happens in one place.
  • The load balancer can inspect HTTP headers for routing.

The connection between LB and backends is HTTP, but it's usually on a private network. If your compliance requirements demand encryption everywhere, use TLS passthrough (L4) or TLS re-encryption (L7 terminates client TLS, then establishes a new TLS connection to the backend).


CDN — Content Delivery Network

A CDN is a globally distributed network of edge servers (PoPs — Points of Presence) that cache content close to users.

text
Loading...

Cache Hit vs Miss

  • Cache hit: the edge has the content cached locally. Responds immediately. No origin server involved.
  • Cache miss: the edge doesn't have the content. Fetches from the origin, caches it with the specified TTL, then serves it.

Cache hit ratio is the primary CDN performance metric. A 95% hit ratio means 95% of requests never reach your origin server.

Pull vs Push CDNs

Pull CDN (most common):

  • Edge fetches content from origin on the first request (cache miss).
  • Subsequent requests are served from cache.
  • Content expires after TTL and is re-fetched on next request.
  • Example: Cloudflare, CloudFront with default config.

Push CDN:

  • You explicitly upload content to CDN edge nodes.
  • Good for large files that you want pre-cached everywhere (game installers, software releases, video).
  • You control exactly what is on the edge — no cold-start latency.
  • Example: Akamai NetStorage, some CloudFront configurations.

What CDNs Cache

Content TypeTTL StrategyNotes
Static assets (JS, CSS, images)Long (days/weeks)Use cache-busting filenames (app.abc123.js)
HTML pagesShort (minutes) or no-cacheDynamic content changes frequently
Video (HLS/DASH segments)Medium (hours)Each segment is a small fixed file
API responsesShort (seconds/minutes)Only for cacheable, non-user-specific responses
Private/personalised contentNever cache on shared CDNUse Cache-Control: private

Video Streaming: HLS and DASH

Large video files are not served as a single blob. They are split into small segments (2–10 seconds each) and served adaptively:

text
Loading...

The player requests the manifest, picks an appropriate quality level based on available bandwidth, and fetches segments. If the network slows down, it switches to a lower-quality stream mid-playback.


Anycast

Anycast is a routing technique where the same IP address is announced from multiple physical locations. BGP routes packets to the topographically nearest (lowest cost) location.

text
Loading...

This is completely transparent to the user — they just see 1.1.1.1.

Why Anycast Matters for DDoS Mitigation

Distributed Denial of Service attacks generate massive traffic volumes aimed at one IP. With anycast:

text
Loading...

The attack is absorbed and distributed across the global network. Cloudflare's anycast network spans 100+ Tbps of capacity, making volumetric DDoS attacks impractical to sustain.


Software Load Balancers

NGINX — HTTP server, reverse proxy, and L7 load balancer.

nginx
Loading...

HAProxy — dedicated TCP and HTTP load balancer, known for high performance and detailed statistics.

text
Loading...

Managed Cloud Load Balancers

ServiceLayerNotes
AWS ALB (Application LB)L7HTTP/HTTPS/gRPC, host/path routing, WAF integration
AWS NLB (Network LB)L4TCP/UDP, extreme throughput, static IP, TLS passthrough
AWS CLB (Classic LB)L4/L7Legacy, prefer ALB/NLB
GCP Cloud Load BalancingL4 + L7Global anycast-based
Azure Load BalancerL4Basic TCP/UDP
Azure Application GatewayL7HTTP/HTTPS + WAF

CDN Providers

ProviderStrength
CloudflareLargest anycast network, DDoS mitigation, DNS, Zero Trust
AWS CloudFrontDeep AWS integration, Lambda@Edge for compute at edge
AkamaiOldest CDN, strong for enterprise and media streaming
FastlyVarnish-based, instant cache purge, edge compute
Bunny.netCost-effective, good for video and static assets

Putting It Together

A typical production architecture for a web application:

InternetCloudflareCDN + DDoS protection + WAFAWS ALB (L7 load balancer)TLS termination · /api/* → API · /* → FrontendAPI Target GroupEC2/ECS (stateless, auto-scaling)RDS (PostgreSQL)+ ElastiCache (Redis)Frontend Target GroupS3 + CloudFrontstatic assets, long TTL

Static assets (JS bundles, images, fonts) are served entirely from the CDN. Dynamic API requests pass through ALB to compute instances. The database and cache are never exposed to the load balancer directly.

This architecture handles traffic spikes (auto-scaling), single-server failures (LB health checks), and global users (CDN edge caching) — the three core scaling problems.