What is Apache Kafka

Understand event streaming, Kafka's role in modern architecture, and core use cases.

Beginner 20 min read 📨 Kafka

What is Event Streaming?

Imagine every action in your application — a user signs up, places an order, clicks a button — is recorded as an event. Event streaming is the practice of capturing these events in real-time and making them available to other systems. It's like a live broadcast of everything happening in your business.

Apache Kafka is the most popular event streaming platform. Originally built at LinkedIn to handle their activity feed (trillions of messages per day), it's now used by 80% of Fortune 100 companies including Netflix, Uber, Airbnb, and Goldman Sachs.

Kafka in one sentence

Kafka is a distributed, fault-tolerant, high-throughput platform for building real-time data pipelines and streaming applications.

Kafka APIs: Producer API, Consumer API, Streams API, Connect API
Kafka's four core APIs — Source: kafka.apache.org (Apache 2.0)

Kafka vs Traditional Message Queues

Kafka is often compared to RabbitMQ or AWS SQS, but it's fundamentally different. Traditional queues delete messages after consumption. Kafka retains messages for a configurable period, letting multiple consumers read the same data independently.

FeatureKafkaRabbitMQ / SQS
Message retentionConfigurable (days/weeks/forever)Deleted after consumption
Consumer modelPull (consumers control pace)Push (broker sends to consumer)
ReplayYes — re-read any offsetNo — once consumed, gone
ThroughputMillions of msg/secThousands of msg/sec
OrderingPer-partition guaranteedBest effort or FIFO queues
Best forEvent streaming, log aggregation, data pipelinesTask queues, RPC, simple pub/sub

Core Use Cases

Log Aggregation

Collect logs from hundreds of servers into a central Kafka topic. Process with Elasticsearch, Splunk, or Datadog.

Event Sourcing

Store every state change as an event. Rebuild application state by replaying the log. Used by banks and trading platforms.

Stream Processing

Transform, filter, and aggregate data in real-time with Kafka Streams or ksqlDB. Fraud detection, recommendations, analytics.

Data Integration

Connect databases, APIs, and file systems with Kafka Connect. Sync data between systems without point-to-point integrations.

Metrics & Monitoring

Collect application metrics in real-time. Feed into Prometheus, Grafana, or custom dashboards.

Change Data Capture

Capture every database change (INSERT, UPDATE, DELETE) as an event with Debezium. Keep derived stores in sync.

When NOT to Use Kafka

Kafka is powerful but not the right tool for everything:

Key Takeaway: Use Kafka when you need to process high-volume event streams, decouple systems, or build real-time data pipelines. Use simpler tools (Redis, SQS, RabbitMQ) for basic task queues or low-volume messaging.
🔍 Deep Dive: Kafka's Origin Story

Kafka was created at LinkedIn in 2010 by Jay Kreps, Neha Narkhede, and Jun Rao to solve a specific problem: LinkedIn had dozens of systems that needed to share data (activity feeds, search indexing, recommendations, monitoring) but connecting them point-to-point created an unmaintainable web of integrations. Kafka provided a central nervous system — every system publishes events to Kafka, and every system reads the events it needs. The name "Kafka" was chosen because the system is "optimized for writing" (like the author Franz Kafka). It was open-sourced in 2011 and became an Apache project in 2012.

Practice Exercises

Easy Hello World Variant

Modify the example to accept user input and print a personalized greeting.

Easy Code Reading

Read through the code examples above and predict the output before running them.

Medium Extend the Example

Take one code example and add error handling, input validation, or a new feature.