Networking & Protocols

Easy 25 min read

Why Networking Matters

Why Networking Matters

The Problem: Every distributed system relies on network communication. Choosing the wrong protocol can mean the difference between a snappy user experience and a sluggish, unreliable one.

The Solution: Understanding how TCP, UDP, HTTP, WebSockets, and DNS work gives you the ability to select the right communication patterns for each part of your system.

Real Impact: Discord uses WebSockets for real-time messaging, Netflix uses HTTP for streaming, and online games use UDP for low-latency updates -- each protocol chosen for specific reasons.

The TCP/IP Network Stack
Application Layer HTTP, HTTPS, WebSocket, gRPC, SMTP, DNS Transport Layer TCP (reliable) | UDP (fast) Network Layer (Internet) IP addressing, routing, packets Link Layer (Data Link) Ethernet, Wi-Fi, MAC addresses, frames Your code OS kernel Routers Hardware Layer 7 Layer 4 Layer 3 Layer 2 Data flows down the stack when sending, up when receiving

TCP vs UDP

TCP and UDP are the two main transport-layer protocols. Every application-layer protocol (HTTP, WebSocket, DNS) runs on top of one of these.

TCP (Transmission Control Protocol)

Reliable, ordered delivery. Establishes a connection via three-way handshake. Guarantees every byte arrives intact and in order. Used by HTTP, HTTPS, SSH, SMTP.

UDP (User Datagram Protocol)

Fast, unreliable delivery. No connection setup, no delivery guarantee. Packets may arrive out of order or not at all. Used by DNS, video streaming, online games, VoIP.

Feature TCP UDP
Connection Connection-oriented (handshake) Connectionless
Reliability Guaranteed delivery, retransmission Best-effort, no guarantees
Ordering Packets arrive in order No ordering guaranteed
Speed Slower (overhead from guarantees) Faster (minimal overhead)
Header Size 20-60 bytes 8 bytes
Flow Control Yes (sliding window) No
Use Cases Web, email, file transfer Gaming, streaming, DNS, IoT

Real-World Analogy

TCP is like sending a registered letter: you get confirmation it was delivered, it arrives in order, and if it gets lost, it is resent.

UDP is like shouting across a room: fast, but some words might get lost, and there is no confirmation the listener heard you.

HTTP/HTTPS and HTTP/2

HTTP (Hypertext Transfer Protocol) is the foundation of web communication. It runs on top of TCP and follows a request-response pattern.

HTTP Request/Response Flow
Client Browser Server API GET /api/users HTTP/1.1 Host: api.example.com | Authorization: Bearer token123 HTTP/1.1 200 OK Content-Type: application/json {"users": [{"id": 1, "name": "Alice"}, ...]} Stateless: each request is independent, server does not remember previous requests

HTTP Methods

Method Purpose Idempotent? Has Body?
GET Retrieve a resource Yes No
POST Create a new resource No Yes
PUT Replace a resource entirely Yes Yes
PATCH Partially update a resource No Yes
DELETE Remove a resource Yes Optional
http_requests.py
import requests

# GET request - retrieve data
response = requests.get(
    "https://api.example.com/users",
    headers={"Authorization": "Bearer token123"}
)
print(response.status_code)  # 200
print(response.json())       # [{"id": 1, "name": "Alice"}, ...]

# POST request - create data
new_user = {"name": "Bob", "email": "[email protected]"}
response = requests.post(
    "https://api.example.com/users",
    json=new_user,
    headers={"Content-Type": "application/json"}
)
print(response.status_code)  # 201 Created

# PUT request - update entire resource
updated_user = {"name": "Bob Smith", "email": "[email protected]"}
response = requests.put(
    "https://api.example.com/users/2",
    json=updated_user
)

# DELETE request - remove resource
response = requests.delete("https://api.example.com/users/2")
print(response.status_code)  # 204 No Content
Output
$ python http_requests.py
GET /api/users -> 200 OK
Response: [{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}]

POST /api/users -> 201 Created
Response: {"id": 3, "name": "Bob", "email": "[email protected]"}

PUT /api/users/2 -> 200 OK
Response: {"id": 2, "name": "Bob Smith", "email": "[email protected]"}

DELETE /api/users/2 -> 204 No Content
Response: (empty body)
Key Takeaway: HTTP is stateless by design -- each request is independent. This makes HTTP easy to scale horizontally (any server can handle any request), but means you need external mechanisms (cookies, tokens, sessions) to maintain user state across requests.

HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1

One request per TCP connection at a time (head-of-line blocking). Browsers open 6-8 parallel connections as a workaround. Text-based headers sent with every request.

HTTP/2

Multiplexing: many requests share one TCP connection. Binary protocol (more efficient). Header compression with HPACK. Server push. Used by most modern websites.

HTTP/3

Runs on QUIC (UDP-based) instead of TCP. Eliminates TCP head-of-line blocking. Faster connection setup (0-RTT). Better performance on unreliable networks (mobile).

WebSockets vs Long Polling

Standard HTTP is request-response: the client always initiates. But what about real-time features where the server needs to push data to the client?

Short Polling

Client repeatedly asks the server for updates at fixed intervals (e.g., every 5 seconds). Simple but wasteful -- most responses are "no new data."

Long Polling

Client sends a request, server holds it open until there is new data (or timeout). Reduces wasted requests but still HTTP overhead per message.

WebSockets

Full-duplex, persistent connection. Both client and server can send messages at any time. Low overhead after initial handshake. Best for real-time apps.

Server-Sent Events (SSE)

One-way stream from server to client over HTTP. Simpler than WebSockets but only server-to-client. Good for live feeds, notifications, stock tickers.

websocket_example.py
import asyncio
import websockets

# WebSocket Server
async def chat_handler(websocket, path):
    """Handle a WebSocket connection for a chat app."""
    print(f"Client connected from {websocket.remote_address}")

    try:
        async for message in websocket:
            # Echo the message back (in a real app, broadcast to all clients)
            print(f"Received: {message}")
            await websocket.send(f"Server echo: {message}")
    except websockets.exceptions.ConnectionClosed:
        print("Client disconnected")

# Start the server
async def main():
    server = await websockets.serve(chat_handler, "localhost", 8765)
    print("WebSocket server running on ws://localhost:8765")
    await server.wait_closed()

asyncio.run(main())

# ---- WebSocket Client ----
async def client():
    async with websockets.connect("ws://localhost:8765") as ws:
        await ws.send("Hello, server!")
        response = await ws.recv()
        print(response)  # "Server echo: Hello, server!"
Output
# WebSocket Server:
$ python websocket_server.py
WebSocket server running on ws://localhost:8765
Client connected from ('127.0.0.1', 52431)
Received: Hello, server!

# WebSocket Client:
$ python websocket_client.py
Server echo: Hello, server!

# Connection stays open -- no HTTP overhead per message
# Headers sent once (handshake), then pure data frames
# Typical frame overhead: 2-14 bytes vs ~800 bytes for HTTP
Key Takeaway: Choose the right real-time protocol for your use case: WebSockets for bidirectional communication (chat, gaming), SSE for server-to-client streams (notifications, live feeds), and long polling only as a fallback for environments that block WebSocket connections.
Deep Dive: TCP Three-Way Handshake and Why It Matters for Performance

Every TCP connection starts with a three-way handshake: (1) Client sends SYN, (2) Server responds SYN-ACK, (3) Client sends ACK. This takes one full round-trip time (RTT) before any data can be sent. For a user in New York connecting to a server in Tokyo (~150ms RTT), the handshake alone costs 150ms. HTTPS adds another 1-2 RTTs for the TLS handshake, totaling 300-450ms before a single byte of application data flows. This is why HTTP/2 multiplexing (reusing one TCP connection for many requests) and HTTP/3's QUIC protocol (0-RTT connection setup) provide such dramatic performance improvements. It is also why CDNs place servers close to users -- reducing RTT from 150ms to 10ms makes the handshake nearly free.

Common Mistake

Wrong: Using WebSockets for a REST API that only needs request-response

Why it fails: WebSockets maintain persistent connections that consume server memory and file descriptors. For 100K users with WebSocket connections, your server holds 100K open sockets even when no data flows. Standard HTTP connections close after each request.

Instead: Use HTTP/2 for request-response APIs. Reserve WebSockets for truly real-time bidirectional features (chat, live collaboration). Use SSE for one-way server push (notifications, feeds).

DNS and How It Works

DNS (Domain Name System) translates human-readable domain names (like google.com) into IP addresses (like 142.250.80.46) that computers use to find each other.

DNS Resolution Steps

  1. Browser Cache: Check if the domain was recently resolved
  2. OS Cache: Check the operating system's DNS cache
  3. Resolver (ISP): Query the recursive DNS resolver
  4. Root Server: Resolver asks a root server "who handles .com?"
  5. TLD Server: Root directs to .com TLD server, which knows the authoritative nameserver
  6. Authoritative NS: Returns the actual IP address for the domain
  7. Cache & Return: Result is cached at each level for the TTL duration
DNS Record Purpose Example
A Maps domain to IPv4 address example.com -> 93.184.216.34
AAAA Maps domain to IPv6 address example.com -> 2606:2800:220:1:...
CNAME Alias for another domain www.example.com -> example.com
MX Mail server for the domain example.com -> mail.example.com
NS Authoritative nameserver example.com -> ns1.example.com
TXT Arbitrary text (verification, SPF) SPF, DKIM, domain verification
dns_lookup.py
import socket
import dns.resolver  # pip install dnspython

# Simple DNS resolution using socket
ip_address = socket.gethostbyname("www.google.com")
print(f"Google IP: {ip_address}")

# Detailed DNS query using dnspython
def lookup_dns(domain, record_type="A"):
    """Query DNS records for a domain."""
    try:
        answers = dns.resolver.resolve(domain, record_type)
        print(f"\n{record_type} records for {domain}:")
        for rdata in answers:
            print(f"  {rdata}")
        print(f"  TTL: {answers.rrset.ttl} seconds")
    except dns.resolver.NXDOMAIN:
        print(f"Domain {domain} does not exist")

# Look up different record types
lookup_dns("google.com", "A")      # IPv4 address
lookup_dns("google.com", "MX")     # Mail servers
lookup_dns("google.com", "NS")     # Name servers
lookup_dns("google.com", "TXT")    # TXT records

# Output:
# A records for google.com:
#   142.250.80.46
#   TTL: 300 seconds
Output
$ python dns_lookup.py
Google IP: 142.250.80.46

A records for google.com:
  142.250.80.46
  TTL: 300 seconds

MX records for google.com:
  10 smtp.google.com
  20 smtp2.google.com
  TTL: 3600 seconds

NS records for google.com:
  ns1.google.com
  ns2.google.com
  TTL: 86400 seconds
Key Takeaway: DNS is the first thing that happens when a user visits your service. A DNS lookup adds 20-120ms of latency to the first request. Use low TTLs (60s) before migrations and high TTLs (3600s) for stable services. In system design, DNS-based load balancing (Route53, Cloudflare) is how you route users to the nearest data center.

Common Pitfall: DNS Caching Issues

Problem: You update your DNS records but users still see the old IP address.

Why: DNS records are cached at multiple levels (browser, OS, ISP resolver) for the TTL duration.

Solution: Before a migration, lower the TTL to 60 seconds days in advance. After the switch, raise it back to hours for better performance.

Practice Problems

Easy Protocol Selection

For each use case, choose the most appropriate protocol and explain why:

  1. A real-time multiplayer game
  2. A file download service
  3. A live stock ticker dashboard
  4. A REST API for a mobile app

Consider: Does it need reliability or speed? Is it one-way or bidirectional? Is it real-time or request-response?

# 1. Real-time multiplayer game: UDP
#    - Low latency is critical (every ms matters)
#    - Missing a frame is ok (next update corrects it)
#    - TCP retransmission would cause lag spikes

# 2. File download: TCP (HTTP/HTTPS)
#    - Every byte must arrive correctly
#    - Order matters (can't reassemble a file from random chunks)
#    - Reliability is more important than speed

# 3. Live stock ticker: SSE or WebSockets
#    - Server pushes updates continuously
#    - SSE if one-way (server to client only)
#    - WebSocket if client also sends (e.g., subscribe/unsubscribe)

# 4. REST API: HTTP/2 over TCP
#    - Standard request-response pattern
#    - HTTP/2 for multiplexing (mobile has limited connections)
#    - JSON payloads, stateless

Medium Design a Chat System's Network Layer

Design the communication protocol for a chat application like WhatsApp:

  1. How do clients connect to the server?
  2. How are messages delivered in real-time?
  3. How do you handle offline users?
  4. What happens when a WebSocket connection drops?

Use WebSockets for real-time delivery, HTTP for initial auth and fetching history. Store messages for offline users and deliver when they reconnect.

# Chat System Network Architecture

# 1. Connection: WebSocket with HTTP fallback
#    - Client authenticates via HTTPS (POST /login)
#    - Upgrades to WebSocket for real-time messaging
#    - Falls back to long polling if WS is blocked

# 2. Real-time delivery:
#    - Sender -> WebSocket -> Chat Server -> WebSocket -> Receiver
#    - Server maintains a map: user_id -> ws_connection
#    - Message acknowledged with delivery receipt

# 3. Offline users:
#    - Messages stored in database with status "pending"
#    - When user reconnects, fetch all pending messages
#    - Push notification sent via APNS/FCM

# 4. Connection drops:
#    - Client implements exponential backoff reconnection
#    - Server detects drop via heartbeat/ping-pong
#    - On reconnect, sync from last received message ID

# Reconnection with exponential backoff:
import time, random

def reconnect_with_backoff(max_retries=10):
    for attempt in range(max_retries):
        delay = min(2 ** attempt, 60)  # Cap at 60s
        jitter = random.uniform(0, delay * 0.1)
        time.sleep(delay + jitter)
        if try_connect():
            return True
    return False

Easy HTTP Status Codes

Match each scenario to the correct HTTP status code:

  1. User successfully created a new account
  2. User requested a page that does not exist
  3. User's authentication token has expired
  4. The server is overloaded and cannot handle the request
  5. The resource has been permanently moved to a new URL

2xx = success, 3xx = redirect, 4xx = client error, 5xx = server error. Specific codes: 201 Created, 301 Moved, 401 Unauthorized, 404 Not Found, 503 Unavailable.

# 1. Account created:     201 Created
#    (resource was successfully created)

# 2. Page not found:       404 Not Found
#    (the requested resource doesn't exist)

# 3. Token expired:        401 Unauthorized
#    (authentication required or failed)

# 4. Server overloaded:    503 Service Unavailable
#    (server can't handle request right now)

# 5. Permanently moved:    301 Moved Permanently
#    (resource has a new URL, update bookmarks)

Quick Reference

Protocol Selection Guide

Protocol Layer Best For Trade-off
TCP Transport Reliable data transfer Higher latency
UDP Transport Real-time, low latency No delivery guarantee
HTTP/2 Application Web APIs, websites Request-response only
WebSocket Application Real-time bidirectional Persistent connection cost
SSE Application Server push (one-way) No client-to-server
gRPC Application Service-to-service (microservices) Not browser-friendly

Common HTTP Status Codes

Code Meaning When to Use
200OKSuccessful GET/PUT request
201CreatedSuccessful POST (resource created)
204No ContentSuccessful DELETE
301Moved PermanentlyURL has changed permanently
304Not ModifiedCached version is still valid
400Bad RequestInvalid request from client
401UnauthorizedAuthentication required
403ForbiddenAuthenticated but not authorized
404Not FoundResource does not exist
429Too Many RequestsRate limit exceeded
500Internal Server ErrorUnhandled server exception
503Service UnavailableServer overloaded or in maintenance