What is Kafka Connect?
Kafka Connect is a framework for moving data between Kafka and other systems without writing code. It provides pre-built connectors for databases, file systems, cloud services, and more. Instead of writing custom producer/consumer code for each integration, you configure a connector with JSON and it handles the rest.
Source Connectors
Read data FROM external systems INTO Kafka topics. Example: Debezium CDC reads every INSERT/UPDATE/DELETE from MySQL and publishes to Kafka.
Sink Connectors
Write data FROM Kafka topics TO external systems. Example: Elasticsearch sink writes Kafka messages into Elasticsearch for search.
Popular Connectors
| Connector | Type | Use Case |
|---|---|---|
| Debezium | Source | Change Data Capture from MySQL, PostgreSQL, MongoDB, SQL Server |
| JDBC Source | Source | Poll database tables for new/changed rows |
| S3 Sink | Sink | Write Kafka messages to S3 as Parquet/Avro/JSON files |
| Elasticsearch Sink | Sink | Index Kafka messages for search and analytics |
| JDBC Sink | Sink | Write Kafka messages to any JDBC database |
| BigQuery Sink | Sink | Stream data into Google BigQuery |
| FileStream | Both | Read/write files (testing and demos only) |
Configuring a Connector
{
"name": "mysql-cdc-source",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "mysql-host",
"database.port": "3306",
"database.user": "debezium",
"database.password": "secret",
"database.server.id": "1",
"topic.prefix": "myapp",
"database.include.list": "mydb",
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
"schema.history.internal.kafka.topic": "schema-changes"
}
}
# Deploy a connector via REST API
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @mysql-cdc-config.json
# List running connectors
curl http://localhost:8083/connectors
# Check connector status
curl http://localhost:8083/connectors/mysql-cdc-source/status
# Pause/Resume a connector
curl -X PUT http://localhost:8083/connectors/mysql-cdc-source/pause
curl -X PUT http://localhost:8083/connectors/mysql-cdc-source/resume
⚠️ Common Mistake: Using JDBC Source when you need CDC
JDBC Source polls the database on a schedule (e.g., every 10 seconds). It misses DELETE operations and has latency. Debezium CDC reads the database's binary log and captures every change in real-time — inserts, updates, AND deletes. Use Debezium for real-time sync; JDBC Source only for batch/periodic loads.
Practice Exercises
Medium Build a Mini Project
Combine concepts from this tutorial to build a small utility or tool.
Medium Debug Challenge
Introduce a bug in one of the code examples and practice finding and fixing it.
Hard Refactoring Exercise
Rewrite one example using a different approach and compare the tradeoffs.