What is Kafka Streams?
Kafka Streams is a client library for building real-time stream processing applications. Unlike Spark Streaming or Flink, it doesn't require a separate cluster — your stream processing logic runs inside your application as a regular Java/Kotlin dependency. It reads from Kafka topics, processes data, and writes results back to Kafka topics.
No separate infrastructure needed
Kafka Streams is just a library in your application. No cluster to manage, no job scheduler, no special deployment. Scale by adding more instances of your application. Kafka handles partition assignment automatically.
KStream vs KTable
Kafka Streams has two core abstractions:
| Concept | Model | Think of it as | Example |
|---|---|---|---|
| KStream | Event stream | Append-only log of facts | Page views, clicks, orders placed |
| KTable | Changelog | Latest value per key (like a DB table) | User profiles, product inventory, account balance |
A KStream treats every record as an independent event. A click at 10:00 and a click at 10:05 are two separate events. A KTable treats records as updates — if user Alice's balance changes from $100 to $150, the KTable only keeps $150.
Stream Operations
Stateless Operations
StreamsBuilder builder = new StreamsBuilder();
KStream<String, String> events = builder.stream("raw-events");
// Filter: keep only purchase events
KStream<String, String> purchases = events
.filter((key, value) -> value.contains("purchase"));
// Map: transform values
KStream<String, String> enriched = purchases
.mapValues(value -> addTimestamp(value));
// Branch: split stream by condition
KStream<String, String>[] branches = events.branch(
(key, value) -> value.contains("error"), // errors
(key, value) -> true // everything else
);
// Write results to output topics
enriched.to("purchase-events");
branches[0].to("error-events");
Stateful Operations
// Count events per key (e.g., page views per URL)
KTable<String, Long> viewCounts = events
.groupByKey()
.count();
// Windowed aggregation: count per 5-minute window
KTable<Windowed<String>, Long> windowedCounts = events
.groupByKey()
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
.count();
// Join KStream with KTable (enrich events with user data)
KStream<String, String> enrichedOrders = orders
.join(userTable, (order, user) -> order + " by " + user);
filter/map/flatMap for transformations. Use groupByKey().count()/aggregate() for aggregations. Use join() to enrich streams with reference data. State is automatically backed by Kafka topics (changelog topics).Windowing
Windowing groups events into time-based buckets for aggregation. Kafka Streams supports three window types:
| Window Type | Behavior | Use Case |
|---|---|---|
| Tumbling | Fixed-size, non-overlapping | "Count per 5 minutes" — each event in exactly one window |
| Hopping | Fixed-size, overlapping | "5-min window, advance every 1 min" — events in multiple windows |
| Session | Dynamic, based on activity gap | "Group events until 30 min of inactivity" — user sessions |
🔍 Deep Dive: State Stores and Fault Tolerance
Stateful operations (count, aggregate, join) maintain local state in RocksDB on disk. This state is backed by Kafka changelog topics — if your application crashes, a new instance rebuilds the state by replaying the changelog. This is automatic and transparent. You get fault-tolerant, distributed state without an external database.
Practice Exercises
Medium Build a Mini Project
Combine concepts from this tutorial to build a small utility or tool.
Medium Debug Challenge
Introduce a bug in one of the code examples and practice finding and fixing it.
Hard Refactoring Exercise
Rewrite one example using a different approach and compare the tradeoffs.