A self-tuning write throttle for batch Postgres jobs

marksie19881 pts0 comments

A self-tuning write throttle for batch Postgres jobs | TotalDebug

≡ menu

// technology. inside out

$ tree ~

├─~/homeh

├─~/categoriesc

├─~/tagst

├─~/archivea

├─~/projectsp

└─~/abouti

© 2026 Steven Marks<br>built with >_

>_ Python · Database

A self-tuning write throttle for batch Postgres jobs

Steven Marks

22 Jun 2026 · 10 min read

I have a background job that writes a lot of rows. It chunks a big import into batches and upserts each batch into Postgres. On its own that’s fine. The problem is that it doesn’t run on its own: the same database is serving live traffic and a handful of other workers at the same time. When my import goes full tilt, everyone else feels it.

The naive fix is to drop a sleep() between batches and call it a day. I did exactly that, and it bugged me for weeks, because a fixed sleep is wrong in both directions at once. So I replaced it with a small feedback controller that tunes itself, and it’s one of those changes that’s far simpler than it sounds. This is how it works and why I’d reach for it again.

Why a fixed sleep is the wrong tool

Say you settle on time.sleep(0.5) between chunks. You’ve now baked in a single number that has to be right for every situation, and it never is.

When the database is idle overnight, that half-second does nothing useful. It just makes a job that could finish in five minutes take fifteen, for no benefit to anyone.

When the database is genuinely under load, half a second might not be anywhere near enough, and your import keeps piling pressure on at exactly the wrong moment.

And the killer: the “right” number changes the moment anything else does. Scale the database up a tier and your sleep is now too conservative. Scale it down, or let two of these jobs run at once, and it’s too aggressive. You end up tuning a magic constant by hand every time the environment shifts, which is precisely the kind of toil I try to design out.

What I actually want is simple to state: go full speed when the database is bored, back off when it’s busy, and never make me re-tune it when the hardware changes.

Use latency as the control signal

Here’s the insight that makes the whole thing work. I don’t need to measure CPU, connection counts, or replication lag to know whether the database is struggling. I can read it straight off the thing I’m already doing: how long each chunk’s write takes.

Per-chunk latency is a brilliant signal because it folds in everything I care about, for free:

A bigger or faster database does my write in less time, so latency drops.

A loaded database does it more slowly, so latency rises.

Pressure from other workers shows up here too. If they’re hammering the database, my queries get slower as well, and that’s exactly the signal I want to react to.

So instead of guessing at a sleep duration, I pick a latency budget : a target time per chunk that represents “this is a healthy pace”. If a chunk comes back under budget, the database has room to spare and I don’t sleep at all. If it comes back over budget, I sleep in proportion to how far over I am. The budget is a stable number tied to acceptable latency, not to the size of the box, so it keeps working when the box changes.

The controller

The heart of this is a single pure function. No database handle, no clock, no hidden state: you hand it the latest measurement and the previous smoothed value, and it hands you back a sleep duration and the new smoothed value. That’s deliberate, because a pure function is trivial to test, and the gnarly part of any throttle is the decision logic, not the plumbing.

def next_chunk_backoff(observed_ms, ema_ms, *, target_ms, max_sleep_ms, factor, alpha=0.5):<br>"""Return (sleep_seconds, new_ema_ms). Proportional backoff on EMA-smoothed<br>per-chunk latency; zero when under the target budget."""<br>new_ema = observed_ms if ema_ms is None else (alpha * observed_ms + (1 - alpha) * ema_ms)<br>overage = new_ema - target_ms<br>if overage 0:<br>return 0.0, new_ema<br>return min(max_sleep_ms, factor * overage) / 1000.0, new_ema

There are only three ideas in there:

Smooth the measurement. Raw per-chunk timings are jittery. One chunk hits a cold cache, the next is instant, and if I reacted to every spike I’d oscillate between sprinting and crawling. An exponential moving average (the ema_ms term) damps that down, so the controller responds to a sustained trend rather than a single unlucky chunk. alpha sets how reactive it is: higher means it trusts the latest reading more.

Back off proportionally. If the smoothed latency is over budget, I sleep by factor times the overage. Twice as far over budget means twice as long backing off. It’s the proportional term of a classic controller, and nothing more clever is needed here.

Cap it. max_sleep_ms stops a pathological reading from parking the job for a minute. Backoff is bounded no matter how ugly the latency gets.

The nicest property is what happens at normal load: the overage is zero or negative, so the function returns a 0.0...

database sleep latency chunk budget write

Related Articles