Kafka Streams

Stateful stream processing with KStream, KTable, windowing, and joins.

Intermediate 40 min read 📨 Kafka

What is Kafka Streams?

Kafka Streams is a client library for building real-time stream processing applications. Unlike Spark Streaming or Flink, it doesn't require a separate cluster — your stream processing logic runs inside your application as a regular Java/Kotlin dependency. It reads from Kafka topics, processes data, and writes results back to Kafka topics.

No separate infrastructure needed

Kafka Streams is just a library in your application. No cluster to manage, no job scheduler, no special deployment. Scale by adding more instances of your application. Kafka handles partition assignment automatically.

KStream vs KTable

Kafka Streams has two core abstractions:

ConceptModelThink of it asExample
KStreamEvent streamAppend-only log of factsPage views, clicks, orders placed
KTableChangelogLatest value per key (like a DB table)User profiles, product inventory, account balance

A KStream treats every record as an independent event. A click at 10:00 and a click at 10:05 are two separate events. A KTable treats records as updates — if user Alice's balance changes from $100 to $150, the KTable only keeps $150.

Stream Operations

Stateless Operations

StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> events = builder.stream("raw-events");

// Filter: keep only purchase events
KStream<String, String> purchases = events
    .filter((key, value) -> value.contains("purchase"));

// Map: transform values
KStream<String, String> enriched = purchases
    .mapValues(value -> addTimestamp(value));

// Branch: split stream by condition
KStream<String, String>[] branches = events.branch(
    (key, value) -> value.contains("error"),   // errors
    (key, value) -> true                        // everything else
);

// Write results to output topics
enriched.to("purchase-events");
branches[0].to("error-events");

Stateful Operations

// Count events per key (e.g., page views per URL)
KTable<String, Long> viewCounts = events
    .groupByKey()
    .count();

// Windowed aggregation: count per 5-minute window
KTable<Windowed<String>, Long> windowedCounts = events
    .groupByKey()
    .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
    .count();

// Join KStream with KTable (enrich events with user data)
KStream<String, String> enrichedOrders = orders
    .join(userTable, (order, user) -> order + " by " + user);
Key Takeaway: Use filter/map/flatMap for transformations. Use groupByKey().count()/aggregate() for aggregations. Use join() to enrich streams with reference data. State is automatically backed by Kafka topics (changelog topics).

Windowing

Windowing groups events into time-based buckets for aggregation. Kafka Streams supports three window types:

Window TypeBehaviorUse Case
TumblingFixed-size, non-overlapping"Count per 5 minutes" — each event in exactly one window
HoppingFixed-size, overlapping"5-min window, advance every 1 min" — events in multiple windows
SessionDynamic, based on activity gap"Group events until 30 min of inactivity" — user sessions
🔍 Deep Dive: State Stores and Fault Tolerance

Stateful operations (count, aggregate, join) maintain local state in RocksDB on disk. This state is backed by Kafka changelog topics — if your application crashes, a new instance rebuilds the state by replaying the changelog. This is automatic and transparent. You get fault-tolerant, distributed state without an external database.

Kafka Streams Processing Topology
Source
input topic
Filter
predicate
Map
transform
GroupBy
aggregate
Sink
output topic

Practice Exercises

Medium Build a Mini Project

Combine concepts from this tutorial to build a small utility or tool.

Medium Debug Challenge

Introduce a bug in one of the code examples and practice finding and fixing it.

Hard Refactoring Exercise

Rewrite one example using a different approach and compare the tradeoffs.