I Built Our Entire Data Lake on DynamoDB Streams Instead of Kafka | by Illya Yalovoy | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Member-only story
I Built Our Entire Data Lake on DynamoDB Streams Instead of Kafka
Illya Yalovoy
14 min read·<br>Just now
Listen
Share
A practical architecture walkthrough with cost comparison and operational tradeoffs from real production.<br>The Problem With Running Kafka on a Small Team<br>Our team was four engineers building a data pipeline for analytics. Nobody’s job title said “platform engineer” or “Kafka administrator.” But Kafka does not care about your org chart. It demands attention.<br>The operational surface area is enormous: broker configuration, ZooKeeper coordination (or KRaft migration if you are on newer versions), partition rebalancing when load shifts, consumer group offset management, schema registry maintenance, disk monitoring across every broker, retention policy tuning, and network throughput planning.<br>The Apache Kafka documentation lists over a hundred configuration parameters across brokers, topics, and clients. “Just use MSK,” people say. And yes, Amazon MSK removes some of the pain. But “managed” does not mean “zero-ops.” You still choose your broker count, instance types, storage per broker, VPC configuration, security groups, and authentication method. You still handle cluster scaling decisions. You still debug consumer lag. The minimum viable MSK cluster is 3 brokers on kafka.m5.large, which runs about $460 per month before you store a single event or transfer a single byte.<br>We tracked it over a quarter. Kafka-related operational work consumed 15–20% of our on-call time. For a…
Written by Illya Yalovoy<br>9 followers<br>·8 following
Help
Status
About
Careers
Press
Blog
Store
Privacy
Rules
Terms
Text to speech