No More Free Lunch for Consistency Across Microservices

No More Free Lunch for Consistency Across Microservices | by Hiroyuki Yamada | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Hiroyuki Yamada

4 min read· Just now

Listen

Press enter or click to view image in full size

Part 2 of “Architecture in the AI Era.” (Part 1 made the case that AI is reshaping when to split a system into microservices.) IInside a single database, consistency is a free lunch. A transaction either commits in full or not at all, and while it’s running, no one sees it half-done. You almost never think about it, because you never have to ask for it. Split that transaction across several services — each with its own database — and the lunch stops being free. The guarantee was a property of the single database doing the work. Take the work apart, and the guarantee doesn’t come with it. This is the bill that comes with microservices. Nothing about this failure is loud. Nothing crashes; the system keeps responding. The data just quietly stops agreeing with itself. The guarantee you stop getting for free An order that reserves inventory, charges a card, and schedules a shipment used to be one transaction. Across services, it’s three local commits in three systems, each succeeding or failing on its own. So you inherit failure modes that simply couldn’t occur before. The charge succeeds but the shipment never gets scheduled. Inventory is reserved against an order that the payment step then rejects. A second request reads a state that’s true in one service and not yet true in another, and makes a decision on it. None of these are bugs in the usual sense — no line of code is wrong. They’re the direct consequence of having taken apart the thing that used to guarantee they couldn’t happen. This is the real cost of the move, and it’s the one teams most often discover in production rather than in design. Two standard answers — and two oversimplifications There are two well-known ways to deal with this. The industry tends to recommend one and dismiss the other, and both halves of that advice are too simple. Saga. Run the operation as a sequence of local steps, each committed by its own service, and if a later step fails, issue compensating actions to undo the earlier ones. It’s a good fit for long-running work and for steps that reach out to external systems or wait on a human. It’s also the approach most people now reach for by default. What the default reaching-for tends to skip over: a saga gives up isolation. Because each step commits on its own, its result is visible to everyone else before the overall operation finishes — or before a later failure rolls it back. Another transaction can read a state that’s about to be compensated away, or two operations can step on the same record and lose one of the updates. These aren’t implementation defects you can engineer out with enough care. They’re structural to the pattern. For plenty of workflows that’s an acceptable trade. The mistake is not noticing you’ve made it. Two-phase commit. Coordinate the participants in two phases — first everyone confirms they can commit, then everyone commits together — so the operation is atomic across services, with the in-between hidden. This is the approach a lot of engineers have been taught to avoid: “2PC is slow, it blocks, it’s a single point of failure, it doesn’t scale.” Here’s the part worth slowing down on. Almost every one of those criticisms is aimed at XA — a specific, decades-old implementation of distributed transactions — not at the underlying idea of two-phase coordination. Blocking when the coordinator dies, the single point of failure, the requirement that every participant be an XA-compatible resource: those are properties of that design. They’re not laws of nature. Modern, consensus-based implementations of the same two-phase idea address most of them. Dismissing the concept because the 2003 implementation aged badly is its own kind of oversimplification. The debate is framed wrong So the common framing — use Saga, avoid 2PC — gets both sides wrong in opposite directions. It overstates Saga by waving away a real loss of isolation, and it understates two-phase approaches by judging the idea on the basis of one old implementation. Neither is universally right, because they aren’t even answering the same question. One trades consistency for flexibility and reach; the other trades flexibility for a strong guarantee. The interesting question was never “which one do we standardize on.” It’s “which one does this operation actually need” — and, underneath that, where the responsibility for getting consistency right should live, now that more of the application layer is being written by AI that won’t reason about any of this on your behalf. That’s what I’ll take on in the next piece.

If you’ve run into this: was it a saga’s missing isolation that bit you — the charged-but-not-shipped...

No More Free Lunch for Consistency Across Microservices

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

Britain Became as Poor as Mississippi