Network Troubleshooting — Tools & Techniques

Network problems have a reputation for being hard to debug. They aren't — they're just unfamiliar. The key is working methodically through the layers rather than guessing. This article covers the standard diagnostic toolkit and a structured approach to the most common failure patterns.


The OSI Model as a Debugging Framework

When something doesn't connect, work from the bottom of the OSI stack upward. Each layer depends on the one below it.

text
Loading...

Start at Layer 1, move up. If you can't ping the default gateway, don't bother debugging DNS. If you can ping the server by IP but not by name, the issue is DNS (Layer 7), not routing (Layer 3).

Most software engineers live at Layers 3–7. The questions to ask at each layer:

LayerQuestion
3 – NetworkCan I reach the IP? (ping, traceroute)
4 – TransportIs the port open? (telnet host port, nc, nmap)
7 – ApplicationIs the service responding correctly? (curl, dig, openssl)

ping — Measuring Reachability and Latency

ping sends ICMP Echo Request packets and waits for ICMP Echo Reply packets.

bash
Loading...

Reading ping Output

FieldWhat it tells you
icmp_seqSequence number — gaps mean lost packets
ttlTime to Live — each router decrements by 1. Low TTL suggests many hops.
timeRound-trip time (RTT) in milliseconds
Packet loss %% of sent packets that didn't return
stddevJitter — high values mean inconsistent latency

What the Numbers Mean

High latency:

  • 1–5ms: LAN or nearby server
  • 10–40ms: Same country / same region
  • 80–120ms: Cross-Atlantic (London to New York)
  • 150–250ms: Cross-Pacific (London to Tokyo)
  • 300ms: Likely a routing issue or satellite link

Packet loss:

  • 0%: Normal
  • 1–5%: Marginal — noticeable in VoIP/video calls
  • 5%: Significant — TCP will retransmit heavily, perceived as slow connection

  • 100%: ICMP may be firewalled (try a different probe), or host is unreachable
bash
Loading...

Important: ICMP is often rate-limited or blocked by firewalls. A failed ping does not always mean the host is down — it may mean ICMP is filtered. Always follow up with a TCP-level check.


traceroute / tracert — Mapping the Path

traceroute reveals every router (hop) between you and the destination and measures the RTT to each.

bash
Loading...

How traceroute Works

It sends probe packets (UDP by default on Linux, ICMP on macOS/Windows) with TTL starting at 1. Each router decrements TTL; when it hits 0, the router sends back an ICMP "Time Exceeded" message, revealing its IP.

text
Loading...

Reading traceroute Output

Asterisks * * *: The router at that hop either doesn't send ICMP Time Exceeded, or a firewall drops it. This doesn't mean the path is broken — traffic still passes through. Look at the hop before and after.

Latency jumps: A sudden increase at a specific hop (e.g., hop 4 is 15ms, hop 5 is 150ms) points to a slow link between those two routers — often an undersea cable or a congested peering point.

Asymmetric routing: Traceroute shows the forward path only. Return packets may take a completely different route.

bash
Loading...

dig / nslookup — DNS Debugging

dig

dig is the most capable DNS query tool.

bash
Loading...
text
Loading...

dig +trace — Full Resolution Path

bash
Loading...

This shows the complete recursive resolution chain from the root servers down to the authoritative nameserver. Useful for diagnosing delegation problems or TTL issues at a specific level.

text
Loading...

Common DNS Record Types

TypePurposeExample
AIPv4 addressexample.com → 93.184.216.34
AAAAIPv6 addressexample.com → 2606:2800::1
CNAMECanonical name (alias)www → example.com
MXMail servergmail.com → gmail-smtp-in.l.google.com
TXTArbitrary text (SPF, DKIM, DMARC, verification)v=spf1 include:...
NSAuthoritative nameserversexample.com → ns1.example.com
PTRReverse DNS (IP to name)34.216.184.93.in-addr.arpa → example.com
SOAStart of AuthoritySerial number, refresh interval, primary NS

netstat / ss — Viewing Connections and Listening Ports

ss is the modern replacement for netstat. Both show socket state.

bash
Loading...
text
Loading...

TCP Socket States

StateMeaning
LISTENServer is waiting for connections
ESTABLISHEDActive connection, data can flow
TIME_WAITConnection closed, waiting for late packets (lasts ~60s)
CLOSE_WAITRemote side closed, local app hasn't closed yet — often a bug
SYN_SENTClient sent SYN, waiting for SYN-ACK
SYN_RECEIVEDServer got SYN, sent SYN-ACK, waiting for ACK
FIN_WAIT_1/2Connection teardown in progress

Many TIME_WAIT entries are normal for a busy HTTP server (each short-lived connection leaves a TIME_WAIT socket). If you see many CLOSE_WAIT, there is likely an application bug — the app is not closing connections after the remote end closes them.


tcpdump — Packet Capture

tcpdump captures raw packets from a network interface. It is the most powerful tool for debugging at the protocol level.

bash
Loading...

Useful Filters

bash
Loading...

What to Look For

TLS handshake (capture and filter for port 443):

text
Loading...

DNS query:

text
Loading...

curl — HTTP Debugging

curl is indispensable for testing HTTP endpoints.

bash
Loading...

The timing breakdown is extremely useful for pinpointing where latency is coming from:

  • High time_namelookup: slow DNS resolver
  • High time_connect: routing problem or far-away server
  • High time_appconnect: slow TLS negotiation
  • High time between time_appconnect and time_starttransfer: slow backend processing

nmap — Port Scanning

nmap probes a host to discover open ports and services. Only use on systems you own or have explicit permission to scan.

bash
Loading...
text
Loading...

Common Issues and Diagnosis

"Can't connect to server"

Work through the layers:

text
Loading...

High Latency

bash
Loading...

Intermittent Packet Loss

bash
Loading...

Look for a specific hop that shows packet loss, with hops after it showing less or no loss. The link between that hop and the previous one is the problem.

SSL Certificate Errors

bash
Loading...
ErrorLikely Cause
certificate has expiredCert past its notAfter date — renew it
certificate is not yet validServer clock wrong, or cert provisioned in advance
hostname mismatchCert issued for different domain — check SAN
unable to verify certificate chainIntermediate cert missing from server config
self-signed certificateCert not signed by a trusted CA

Python: ICMP Ping and Traceroute with Raw Sockets

Raw sockets allow you to construct and send ICMP packets directly. This requires root/admin privileges.

Calculating the ICMP Checksum

python
Loading...

ICMP Ping

python
Loading...

Basic Traceroute

python
Loading...

Quick Reference Cheat Sheet

bash
Loading...

Systematic troubleshooting — bottom of the OSI stack to the top, one layer at a time — resolves almost every network problem you will encounter. The tools in this article cover every step of that process.