Network Troubleshooting — Tools & Techniques
Network problems have a reputation for being hard to debug. They aren't — they're just unfamiliar. The key is working methodically through the layers rather than guessing. This article covers the standard diagnostic toolkit and a structured approach to the most common failure patterns.
The OSI Model as a Debugging Framework
When something doesn't connect, work from the bottom of the OSI stack upward. Each layer depends on the one below it.
Start at Layer 1, move up. If you can't ping the default gateway, don't bother debugging DNS. If you can ping the server by IP but not by name, the issue is DNS (Layer 7), not routing (Layer 3).
Most software engineers live at Layers 3–7. The questions to ask at each layer:
| Layer | Question |
|---|---|
| 3 – Network | Can I reach the IP? (ping, traceroute) |
| 4 – Transport | Is the port open? (telnet host port, nc, nmap) |
| 7 – Application | Is the service responding correctly? (curl, dig, openssl) |
ping — Measuring Reachability and Latency
ping sends ICMP Echo Request packets and waits for ICMP Echo Reply packets.
Reading ping Output
| Field | What it tells you |
|---|---|
icmp_seq | Sequence number — gaps mean lost packets |
ttl | Time to Live — each router decrements by 1. Low TTL suggests many hops. |
time | Round-trip time (RTT) in milliseconds |
| Packet loss % | % of sent packets that didn't return |
| stddev | Jitter — high values mean inconsistent latency |
What the Numbers Mean
High latency:
- 1–5ms: LAN or nearby server
- 10–40ms: Same country / same region
- 80–120ms: Cross-Atlantic (London to New York)
- 150–250ms: Cross-Pacific (London to Tokyo)
-
300ms: Likely a routing issue or satellite link
Packet loss:
- 0%: Normal
- 1–5%: Marginal — noticeable in VoIP/video calls
-
5%: Significant — TCP will retransmit heavily, perceived as slow connection
- 100%: ICMP may be firewalled (try a different probe), or host is unreachable
Important: ICMP is often rate-limited or blocked by firewalls. A failed ping does not always mean the host is down — it may mean ICMP is filtered. Always follow up with a TCP-level check.
traceroute / tracert — Mapping the Path
traceroute reveals every router (hop) between you and the destination and measures the RTT to each.
How traceroute Works
It sends probe packets (UDP by default on Linux, ICMP on macOS/Windows) with TTL starting at 1. Each router decrements TTL; when it hits 0, the router sends back an ICMP "Time Exceeded" message, revealing its IP.
Reading traceroute Output
Asterisks * * *: The router at that hop either doesn't send ICMP Time Exceeded, or a firewall drops it. This doesn't mean the path is broken — traffic still passes through. Look at the hop before and after.
Latency jumps: A sudden increase at a specific hop (e.g., hop 4 is 15ms, hop 5 is 150ms) points to a slow link between those two routers — often an undersea cable or a congested peering point.
Asymmetric routing: Traceroute shows the forward path only. Return packets may take a completely different route.
dig / nslookup — DNS Debugging
dig
dig is the most capable DNS query tool.
dig +trace — Full Resolution Path
This shows the complete recursive resolution chain from the root servers down to the authoritative nameserver. Useful for diagnosing delegation problems or TTL issues at a specific level.
Common DNS Record Types
| Type | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com → 93.184.216.34 |
| AAAA | IPv6 address | example.com → 2606:2800::1 |
| CNAME | Canonical name (alias) | www → example.com |
| MX | Mail server | gmail.com → gmail-smtp-in.l.google.com |
| TXT | Arbitrary text (SPF, DKIM, DMARC, verification) | v=spf1 include:... |
| NS | Authoritative nameservers | example.com → ns1.example.com |
| PTR | Reverse DNS (IP to name) | 34.216.184.93.in-addr.arpa → example.com |
| SOA | Start of Authority | Serial number, refresh interval, primary NS |
netstat / ss — Viewing Connections and Listening Ports
ss is the modern replacement for netstat. Both show socket state.
TCP Socket States
| State | Meaning |
|---|---|
| LISTEN | Server is waiting for connections |
| ESTABLISHED | Active connection, data can flow |
| TIME_WAIT | Connection closed, waiting for late packets (lasts ~60s) |
| CLOSE_WAIT | Remote side closed, local app hasn't closed yet — often a bug |
| SYN_SENT | Client sent SYN, waiting for SYN-ACK |
| SYN_RECEIVED | Server got SYN, sent SYN-ACK, waiting for ACK |
| FIN_WAIT_1/2 | Connection teardown in progress |
Many TIME_WAIT entries are normal for a busy HTTP server (each short-lived connection leaves a TIME_WAIT socket). If you see many CLOSE_WAIT, there is likely an application bug — the app is not closing connections after the remote end closes them.
tcpdump — Packet Capture
tcpdump captures raw packets from a network interface. It is the most powerful tool for debugging at the protocol level.
Useful Filters
What to Look For
TLS handshake (capture and filter for port 443):
DNS query:
curl — HTTP Debugging
curl is indispensable for testing HTTP endpoints.
The timing breakdown is extremely useful for pinpointing where latency is coming from:
- High
time_namelookup: slow DNS resolver - High
time_connect: routing problem or far-away server - High
time_appconnect: slow TLS negotiation - High time between
time_appconnectandtime_starttransfer: slow backend processing
nmap — Port Scanning
nmap probes a host to discover open ports and services. Only use on systems you own or have explicit permission to scan.
Common Issues and Diagnosis
"Can't connect to server"
Work through the layers:
High Latency
Intermittent Packet Loss
Look for a specific hop that shows packet loss, with hops after it showing less or no loss. The link between that hop and the previous one is the problem.
SSL Certificate Errors
| Error | Likely Cause |
|---|---|
certificate has expired | Cert past its notAfter date — renew it |
certificate is not yet valid | Server clock wrong, or cert provisioned in advance |
hostname mismatch | Cert issued for different domain — check SAN |
unable to verify certificate chain | Intermediate cert missing from server config |
self-signed certificate | Cert not signed by a trusted CA |
Python: ICMP Ping and Traceroute with Raw Sockets
Raw sockets allow you to construct and send ICMP packets directly. This requires root/admin privileges.
Calculating the ICMP Checksum
ICMP Ping
Basic Traceroute
Quick Reference Cheat Sheet
Systematic troubleshooting — bottom of the OSI stack to the top, one layer at a time — resolves almost every network problem you will encounter. The tools in this article cover every step of that process.