What is Event Streaming?
Imagine every action in your application — a user signs up, places an order, clicks a button — is recorded as an event. Event streaming is the practice of capturing these events in real-time and making them available to other systems. It's like a live broadcast of everything happening in your business.
Apache Kafka is the most popular event streaming platform. Originally built at LinkedIn to handle their activity feed (trillions of messages per day), it's now used by 80% of Fortune 100 companies including Netflix, Uber, Airbnb, and Goldman Sachs.
Kafka in one sentence
Kafka is a distributed, fault-tolerant, high-throughput platform for building real-time data pipelines and streaming applications.
Kafka vs Traditional Message Queues
Kafka is often compared to RabbitMQ or AWS SQS, but it's fundamentally different. Traditional queues delete messages after consumption. Kafka retains messages for a configurable period, letting multiple consumers read the same data independently.
| Feature | Kafka | RabbitMQ / SQS |
|---|---|---|
| Message retention | Configurable (days/weeks/forever) | Deleted after consumption |
| Consumer model | Pull (consumers control pace) | Push (broker sends to consumer) |
| Replay | Yes — re-read any offset | No — once consumed, gone |
| Throughput | Millions of msg/sec | Thousands of msg/sec |
| Ordering | Per-partition guaranteed | Best effort or FIFO queues |
| Best for | Event streaming, log aggregation, data pipelines | Task queues, RPC, simple pub/sub |
Core Use Cases
Log Aggregation
Collect logs from hundreds of servers into a central Kafka topic. Process with Elasticsearch, Splunk, or Datadog.
Event Sourcing
Store every state change as an event. Rebuild application state by replaying the log. Used by banks and trading platforms.
Stream Processing
Transform, filter, and aggregate data in real-time with Kafka Streams or ksqlDB. Fraud detection, recommendations, analytics.
Data Integration
Connect databases, APIs, and file systems with Kafka Connect. Sync data between systems without point-to-point integrations.
Metrics & Monitoring
Collect application metrics in real-time. Feed into Prometheus, Grafana, or custom dashboards.
Change Data Capture
Capture every database change (INSERT, UPDATE, DELETE) as an event with Debezium. Keep derived stores in sync.
When NOT to Use Kafka
Kafka is powerful but not the right tool for everything:
- Simple task queues — If you just need workers to process jobs, RabbitMQ or Redis is simpler.
- Small scale — If you process a few hundred messages per minute, Kafka's operational complexity isn't worth it.
- Request-response — Kafka is async. For synchronous API calls, use HTTP/gRPC directly.
- Exact-once delivery without design — Kafka supports exactly-once, but it requires careful architecture. Don't assume it "just works."
🔍 Deep Dive: Kafka's Origin Story
Kafka was created at LinkedIn in 2010 by Jay Kreps, Neha Narkhede, and Jun Rao to solve a specific problem: LinkedIn had dozens of systems that needed to share data (activity feeds, search indexing, recommendations, monitoring) but connecting them point-to-point created an unmaintainable web of integrations. Kafka provided a central nervous system — every system publishes events to Kafka, and every system reads the events it needs. The name "Kafka" was chosen because the system is "optimized for writing" (like the author Franz Kafka). It was open-sourced in 2011 and became an Apache project in 2012.
Practice Exercises
Easy Hello World Variant
Modify the example to accept user input and print a personalized greeting.
Easy Code Reading
Read through the code examples above and predict the output before running them.
Medium Extend the Example
Take one code example and add error handling, input validation, or a new feature.