Taming Production Database Migrations

Taming production database migrationsTaming production database migrations

Disclosure: Neon paid me to write this article. The argument and opinions are mine.

Production migrations are scary. But you can do it safely, without downtime or crashing clients.

Here’s how.

The two boundaries

Three things are in play.

Database. Changed by migrations. A migration can change the schema, data, or both.

Server. Changed by code deploys. It consumes the schema, exposes the API, and owns the mapping.

Clients. Changed by user updates, reloads, installs, or not at all. They consume the API on their own schedule.

That gives you two boundaries.

Schema boundary. Database <> server. The shared surface is the schema.

API boundary. Server <> client. The shared surface is the API.

Atomicity

The dream is to avoid coordinating schema and code at all. Change the schema, deploy the code, and have the whole thing land as one all-or-nothing operation.

You can get close at the schema boundary because you control both moving parts. You run the migration. You deploy the server. If you can afford to stop traffic for the release, or if you trust your rollback path, that can be a perfectly reasonable tradeoff. Plenty of smaller apps live there and do fine.

You cannot get that at the API boundary because you do not control clients in the wild. A breaking API change breaks every mobile app, desktop app, web app, and CLI until they update. Those clients move when they move.

That is the limit. Atomicity can buy simplicity where you control both sides. Expand-and-contract works everywhere because it does not need atomicity.

Expand-and-contract

This is how you keep shipping with zero downtime, no broken clients, and no permanent compatibility cruft.

Expand-and-contract is a discipline where you change an interface in steps, so the provider and the consumer do not have to move at the same moment.

Instead of replacing the old shape with the new shape in one move, you keep the old shape working, add the new shape beside it, move consumers over, then remove the old shape.

At every step, every live consumer has a shape it understands.

From principle to example

Consider a production system with a database, a server exposing a REST API, and clients that move on different schedules. The web app reloads quickly. The mobile app updates slowly. The desktop app and CLI may sit on old versions for a long time.

Now imagine you need to rename users.bio to users.description without breaking API clients that still use bio.

Current state.

Database has users.bio

API exposes bio

Clients read and write bio

Target state.

Database has users.description

API exposes description

Clients read and write description

The objective is to land the rename with zero downtime, no maintenance window, no crashed clients, and no permanent compatibility cruft. The old shape should be gone at the end, not carried forever as a tax on every future change.

The same rename crosses two boundaries. The schema side moves on your schedule. The API side moves on the clients’ schedule.

At the schema boundary

At the schema boundary, the consumer is your own server. That is why this part can finish quickly.

Schema and data migrations

A migration can change the schema, the data, or both. Schema changes alter the shape the server sees. Data changes move rows into that shape.

The objective is to keep the schema non-breaking at all times.

Forward-only migrations

Prefer forward-only migrations. A down migration is often impossible and rarely tested well enough to trust under pressure. Instead, fix mistakes with another forward migration.

The rename at the schema boundary takes six moves. Each move is one logical change, a schema migration, a data migration, or a server deploy.

Current state

This is the baseline. Everything is simple and aligned.

state API shape { bio: string } DB schema { bio: string }

server mapping read api.bio ← db.bio write api.bio → db.bio Move 1, schema expands

Run a schema migration that adds description as nullable.

state API shape { bio: string } ← unchanged DB schema { bio: string, description: string|null } ← sparse

server mapping read api.bio ← db.bio write api.bio → db.bio Old server code still reads and writes bio, and that remains legal. The new column exists, but nothing depends on it yet.

Move 2, server follows

Deploy server code that starts dual-writing.

state API shape { bio: string } ← unchanged DB schema { bio: string, description: string|null } ← sparse

server mapping read api.bio ← db.bio write api.bio → db.bio, db.description The server still reads from bio, so old rows are fine. New writes fill both columns.

Move 3, data catches up

Run a data migration that backfills description = bio where description is null.

state API shape { bio: string } ← unchanged DB schema { bio: string, description: string|null } ← backfilled, in sync

server mapping read api.bio ← db.bio write api.bio → db.bio, db.description The...

Taming Production Database Migrations

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

The labor share of income in the US is at its lowest post-war level