Kafka Rebalances: What's Happening Under the Hood

minatafreshi1 pts0 comments

Kafka Rebalances: What’s Actually Happening Under the Hood | by Mina Tafreshi | Jun, 2026 | MediumSitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

Kafka Rebalances: What’s Actually Happening Under the Hood

Mina Tafreshi

9 min read·<br>Just now

Listen

Share

Most articles about Kafka rebalances stop right at the edge of being useful. They tell you that consumers “stop, rebalance and resume.” They tell you there’s a Group Coordinator and a Group Leader. They tell you to tune session.timeout.ms. And then they end with a configuration table, leaving you no better equipped to reason about failures you haven't seen before.<br>This isn’t that article. We’re going down to the wire protocol, the state machine, and the exact binary structures written to __consumer_offsets because that's the level at which you actually understand what's going wrong when rebalances won't stop.<br>Press enter or click to view image in full size

From: https://www.confluent.io/learn/kafka-rebalancing/

The Group Coordinator is a state machine<br>The most important reframe: the Group Coordinator isn’t a registry that passively tracks membership. It’s a finite state machine with five states, running inside a specific broker process. Every JoinGroup, SyncGroup, Heartbeat, and LeaveGroup request is a state transition event, not a lookup.<br>Empty<br>│ first member joins<br>PreparingRebalance ◀──── member joins / leaves / heartbeat timeout<br>│ all JoinGroups received (or rebalance_timeout_ms expires)<br>CompletingRebalance<br>│ all SyncGroups received<br>Stable ──────── any topology change ────▶ PreparingRebalanceWhen you see REBALANCE_IN_PROGRESS in your consumer logs, the coordinator is in PreparingRebalance, returning that error to heart beating consumers to signal: stop what you're doing and send JoinGroup. When a consumer crashes mid-rebalance and the entire group restarts the process, that's because any topology change while in CompletingRebalance kicks the state machine straight back to PreparingRebalance.<br>The coordinator lives on a specific broker, determined by hashing your group.id to a partition of __consumer_offsets:<br>coordinator_partition = abs(murmur2(group.id)) % num_partitions(__consumer_offsets)With the default of 50 partitions, all your consumer groups are spread across at most 50 coordinator brokers. Under heavy rolling deployment load, this concentration matters.

What actually crosses the wire<br>The JoinGroup request carries more than most engineers realize. The metadata bytes inside it aren't opaque — they're a versioned binary structure called ConsumerProtocolSubscription:<br>JoinGroup Request (v9) =><br>group_id STRING<br>session_timeout_ms INT32<br>rebalance_timeout_ms INT32<br>member_id STRING ← empty on first join<br>group_instance_id NULLABLE_STRING ← non-null = static member<br>protocol_type STRING<br>protocols ARRAY<br>name STRING ← e.g. "cooperative-sticky"<br>metadata BYTES<br>↳ ConsumerProtocolSubscription =><br>version INT16<br>topics ARRAY[STRING]<br>owned_partitions ARRAY ← [topic, partitions[]]<br>generation_id INT32The owned_partitions field is the physical mechanism behind incremental cooperative rebalancing. When the elected group leader just a regular consumer instance, not a special process receives the JoinGroup response, it gets every member's ConsumerProtocolSubscription, meaning it can see exactly who owns what before computing the new assignment. Without that field, cooperative rebalancing is architecturally impossible.<br>The response is intentionally asymmetric. Followers receive an empty members array. Only the leader's response contains the full membership metadata. This is a deliberate tradeoff: smaller payloads for 19 out of 20 consumers, full data for the one that needs it.<br>Once the leader computes the plan, it sends it back via SyncGroup:<br>SyncGroup Request (v5) =><br>group_id STRING<br>generation_id INT32 ← stale value → ILLEGAL_GENERATION<br>member_id STRING<br>assignments ARRAY ← non-empty only in the leader's request<br>member_id STRING<br>assignment BYTES<br>↳ ConsumerProtocolAssignment =><br>version INT16<br>partitions ARRAY [topic STRING, partitions INT32[]]Followers send SyncGroup with an empty assignments array. The broker writes the full assignment to __consumer_offsets and replies to everyone with their slice.

Generation IDs: how Kafka prevents zombie consumers from corrupting state<br>Every JoinGroup response includes a generation_id a monotonic integer that increments every time a rebalance completes. Every subsequent heartbeat, offset commit, and SyncGroup carries that generation ID. A stale one gets rejected with ILLEGAL_GENERATION (error code 22).<br>Why this matters: imagine a consumer that owns partition P0 and hits a long GC pause. The coordinator times it out, runs a rebalance, increments the generation to N+1, and assigns P0 to a different consumer which starts committing progress. When the GC-paused consumer wakes up and tries to commit its own offset, its request still carries generation N. The coordinator rejects it. The consumer...

string group coordinator consumer array kafka

Related Articles