Networking & Protocols | LIZIU System Design

Why Networking Matters

The Problem: Every distributed system relies on network communication. Choosing the wrong protocol can mean the difference between a snappy user experience and a sluggish, unreliable one.

The Solution: Understanding how TCP, UDP, HTTP, WebSockets, and DNS work gives you the ability to select the right communication patterns for each part of your system.

Real Impact: Discord uses WebSockets for real-time messaging, Netflix uses HTTP for streaming, and online games use UDP for low-latency updates -- each protocol chosen for specific reasons.

The TCP/IP Network Stack

TCP vs UDP

TCP and UDP are the two main transport-layer protocols. Every application-layer protocol (HTTP, WebSocket, DNS) runs on top of one of these.

TCP (Transmission Control Protocol)

Reliable, ordered delivery. Establishes a connection via three-way handshake. Guarantees every byte arrives intact and in order. Used by HTTP, HTTPS, SSH, SMTP.

UDP (User Datagram Protocol)

Fast, unreliable delivery. No connection setup, no delivery guarantee. Packets may arrive out of order or not at all. Used by DNS, video streaming, online games, VoIP.

Feature	TCP	UDP
Connection	Connection-oriented (handshake)	Connectionless
Reliability	Guaranteed delivery, retransmission	Best-effort, no guarantees
Ordering	Packets arrive in order	No ordering guaranteed
Speed	Slower (overhead from guarantees)	Faster (minimal overhead)
Header Size	20-60 bytes	8 bytes
Flow Control	Yes (sliding window)	No
Use Cases	Web, email, file transfer	Gaming, streaming, DNS, IoT

Real-World Analogy

TCP is like sending a registered letter: you get confirmation it was delivered, it arrives in order, and if it gets lost, it is resent.

UDP is like shouting across a room: fast, but some words might get lost, and there is no confirmation the listener heard you.

HTTP/HTTPS and HTTP/2

HTTP (Hypertext Transfer Protocol) is the foundation of web communication. It runs on top of TCP and follows a request-response pattern.

HTTP Request/Response Flow

HTTP Methods

Method	Purpose	Idempotent?	Has Body?
GET	Retrieve a resource	Yes	No
POST	Create a new resource	No	Yes
PUT	Replace a resource entirely	Yes	Yes
PATCH	Partially update a resource	No	Yes
DELETE	Remove a resource	Yes	Optional

http_requests.py

import requests

# GET request - retrieve data
response = requests.get(
    "https://api.example.com/users",
    headers={"Authorization": "Bearer token123"}
)
print(response.status_code)  # 200
print(response.json())       # [{"id": 1, "name": "Alice"}, ...]

# POST request - create data
new_user = {"name": "Bob", "email": "[email protected]"}
response = requests.post(
    "https://api.example.com/users",
    json=new_user,
    headers={"Content-Type": "application/json"}
)
print(response.status_code)  # 201 Created

# PUT request - update entire resource
updated_user = {"name": "Bob Smith", "email": "[email protected]"}
response = requests.put(
    "https://api.example.com/users/2",
    json=updated_user
)

# DELETE request - remove resource
response = requests.delete("https://api.example.com/users/2")
print(response.status_code)  # 204 No Content

HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1

One request per TCP connection at a time (head-of-line blocking). Browsers open 6-8 parallel connections as a workaround. Text-based headers sent with every request.

HTTP/2

Multiplexing: many requests share one TCP connection. Binary protocol (more efficient). Header compression with HPACK. Server push. Used by most modern websites.

HTTP/3

Runs on QUIC (UDP-based) instead of TCP. Eliminates TCP head-of-line blocking. Faster connection setup (0-RTT). Better performance on unreliable networks (mobile).

WebSockets vs Long Polling

Standard HTTP is request-response: the client always initiates. But what about real-time features where the server needs to push data to the client?

Short Polling

Client repeatedly asks the server for updates at fixed intervals (e.g., every 5 seconds). Simple but wasteful -- most responses are "no new data."

Long Polling

Client sends a request, server holds it open until there is new data (or timeout). Reduces wasted requests but still HTTP overhead per message.

WebSockets

Full-duplex, persistent connection. Both client and server can send messages at any time. Low overhead after initial handshake. Best for real-time apps.

Server-Sent Events (SSE)

One-way stream from server to client over HTTP. Simpler than WebSockets but only server-to-client. Good for live feeds, notifications, stock tickers.

websocket_example.py

import asyncio
import websockets

# WebSocket Server
async def chat_handler(websocket, path):
    """Handle a WebSocket connection for a chat app."""
    print(f"Client connected from {websocket.remote_address}")

    try:
        async for message in websocket:
            # Echo the message back (in a real app, broadcast to all clients)
            print(f"Received: {message}")
            await websocket.send(f"Server echo: {message}")
    except websockets.exceptions.ConnectionClosed:
        print("Client disconnected")

# Start the server
async def main():
    server = await websockets.serve(chat_handler, "localhost", 8765)
    print("WebSocket server running on ws://localhost:8765")
    await server.wait_closed()

asyncio.run(main())

# ---- WebSocket Client ----
async def client():
    async with websockets.connect("ws://localhost:8765") as ws:
        await ws.send("Hello, server!")
        response = await ws.recv()
        print(response)  # "Server echo: Hello, server!"

DNS and How It Works

DNS (Domain Name System) translates human-readable domain names (like google.com) into IP addresses (like 142.250.80.46) that computers use to find each other.

DNS Resolution Steps

Browser Cache: Check if the domain was recently resolved
OS Cache: Check the operating system's DNS cache
Resolver (ISP): Query the recursive DNS resolver
Root Server: Resolver asks a root server "who handles .com?"
TLD Server: Root directs to .com TLD server, which knows the authoritative nameserver
Authoritative NS: Returns the actual IP address for the domain
Cache & Return: Result is cached at each level for the TTL duration

DNS Record	Purpose	Example
A	Maps domain to IPv4 address	example.com -> 93.184.216.34
AAAA	Maps domain to IPv6 address	example.com -> 2606:2800:220:1:...
CNAME	Alias for another domain	www.example.com -> example.com
MX	Mail server for the domain	example.com -> mail.example.com
NS	Authoritative nameserver	example.com -> ns1.example.com
TXT	Arbitrary text (verification, SPF)	SPF, DKIM, domain verification

dns_lookup.py

import socket
import dns.resolver  # pip install dnspython

# Simple DNS resolution using socket
ip_address = socket.gethostbyname("www.google.com")
print(f"Google IP: {ip_address}")

# Detailed DNS query using dnspython
def lookup_dns(domain, record_type="A"):
    """Query DNS records for a domain."""
    try:
        answers = dns.resolver.resolve(domain, record_type)
        print(f"\n{record_type} records for {domain}:")
        for rdata in answers:
            print(f"  {rdata}")
        print(f"  TTL: {answers.rrset.ttl} seconds")
    except dns.resolver.NXDOMAIN:
        print(f"Domain {domain} does not exist")

# Look up different record types
lookup_dns("google.com", "A")      # IPv4 address
lookup_dns("google.com", "MX")     # Mail servers
lookup_dns("google.com", "NS")     # Name servers
lookup_dns("google.com", "TXT")    # TXT records

# Output:
# A records for google.com:
#   142.250.80.46
#   TTL: 300 seconds

Common Pitfall: DNS Caching Issues

Problem: You update your DNS records but users still see the old IP address.

Why: DNS records are cached at multiple levels (browser, OS, ISP resolver) for the TTL duration.

Solution: Before a migration, lower the TTL to 60 seconds days in advance. After the switch, raise it back to hours for better performance.

Practice Problems

Easy Protocol Selection

For each use case, choose the most appropriate protocol and explain why:

A real-time multiplayer game
A file download service
A live stock ticker dashboard
A REST API for a mobile app

Consider: Does it need reliability or speed? Is it one-way or bidirectional? Is it real-time or request-response?

# 1. Real-time multiplayer game: UDP
#    - Low latency is critical (every ms matters)
#    - Missing a frame is ok (next update corrects it)
#    - TCP retransmission would cause lag spikes

# 2. File download: TCP (HTTP/HTTPS)
#    - Every byte must arrive correctly
#    - Order matters (can't reassemble a file from random chunks)
#    - Reliability is more important than speed

# 3. Live stock ticker: SSE or WebSockets
#    - Server pushes updates continuously
#    - SSE if one-way (server to client only)
#    - WebSocket if client also sends (e.g., subscribe/unsubscribe)

# 4. REST API: HTTP/2 over TCP
#    - Standard request-response pattern
#    - HTTP/2 for multiplexing (mobile has limited connections)
#    - JSON payloads, stateless

Medium Design a Chat System's Network Layer

Design the communication protocol for a chat application like WhatsApp:

How do clients connect to the server?
How are messages delivered in real-time?
How do you handle offline users?
What happens when a WebSocket connection drops?

Use WebSockets for real-time delivery, HTTP for initial auth and fetching history. Store messages for offline users and deliver when they reconnect.

# Chat System Network Architecture

# 1. Connection: WebSocket with HTTP fallback
#    - Client authenticates via HTTPS (POST /login)
#    - Upgrades to WebSocket for real-time messaging
#    - Falls back to long polling if WS is blocked

# 2. Real-time delivery:
#    - Sender -> WebSocket -> Chat Server -> WebSocket -> Receiver
#    - Server maintains a map: user_id -> ws_connection
#    - Message acknowledged with delivery receipt

# 3. Offline users:
#    - Messages stored in database with status "pending"
#    - When user reconnects, fetch all pending messages
#    - Push notification sent via APNS/FCM

# 4. Connection drops:
#    - Client implements exponential backoff reconnection
#    - Server detects drop via heartbeat/ping-pong
#    - On reconnect, sync from last received message ID

# Reconnection with exponential backoff:
import time, random

def reconnect_with_backoff(max_retries=10):
    for attempt in range(max_retries):
        delay = min(2 ** attempt, 60)  # Cap at 60s
        jitter = random.uniform(0, delay * 0.1)
        time.sleep(delay + jitter)
        if try_connect():
            return True
    return False

Easy HTTP Status Codes

Match each scenario to the correct HTTP status code:

User successfully created a new account
User requested a page that does not exist
User's authentication token has expired
The server is overloaded and cannot handle the request
The resource has been permanently moved to a new URL

2xx = success, 3xx = redirect, 4xx = client error, 5xx = server error. Specific codes: 201 Created, 301 Moved, 401 Unauthorized, 404 Not Found, 503 Unavailable.

# 1. Account created:     201 Created
#    (resource was successfully created)

# 2. Page not found:       404 Not Found
#    (the requested resource doesn't exist)

# 3. Token expired:        401 Unauthorized
#    (authentication required or failed)

# 4. Server overloaded:    503 Service Unavailable
#    (server can't handle request right now)

# 5. Permanently moved:    301 Moved Permanently
#    (resource has a new URL, update bookmarks)

Quick Reference

Protocol Selection Guide

Protocol	Layer	Best For	Trade-off
TCP	Transport	Reliable data transfer	Higher latency
UDP	Transport	Real-time, low latency	No delivery guarantee
HTTP/2	Application	Web APIs, websites	Request-response only
WebSocket	Application	Real-time bidirectional	Persistent connection cost
SSE	Application	Server push (one-way)	No client-to-server
gRPC	Application	Service-to-service (microservices)	Not browser-friendly

Common HTTP Status Codes

Code	Meaning	When to Use
200	OK	Successful GET/PUT request
201	Created	Successful POST (resource created)
204	No Content	Successful DELETE
301	Moved Permanently	URL has changed permanently
304	Not Modified	Cached version is still valid
400	Bad Request	Invalid request from client
401	Unauthorized	Authentication required
403	Forbidden	Authenticated but not authorized
404	Not Found	Resource does not exist
429	Too Many Requests	Rate limit exceeded
500	Internal Server Error	Unhandled server exception
503	Service Unavailable	Server overloaded or in maintenance

Why Networking Matters

Why Networking Matters

TCP vs UDP

TCP (Transmission Control Protocol)

UDP (User Datagram Protocol)

Real-World Analogy

HTTP/HTTPS and HTTP/2

HTTP Methods

HTTP/1.1 vs HTTP/2 vs HTTP/3

HTTP/1.1

HTTP/2

HTTP/3

WebSockets vs Long Polling

Short Polling

Long Polling

WebSockets

Server-Sent Events (SSE)

DNS and How It Works

DNS Resolution Steps

Common Pitfall: DNS Caching Issues

Practice Problems

Easy Protocol Selection

Medium Design a Chat System's Network Layer

Easy HTTP Status Codes

Quick Reference

Protocol Selection Guide

Common HTTP Status Codes

Related Topics