Zero-Downtime Deployments with Docker Compose – No Kubernetes Required

canto1 pts0 comments

Zero-Downtime Deployments with Docker Compose — No Kubernetes Required | StatusDude

Get Started

← Back to Blog<br>Zero-Downtime Deployments with Docker Compose — No Kubernetes Required

June 24, 2026<br>There's a mass delusion in the industry that you need Kubernetes to run a serious production service. You don't. At StatusDude, we serve thousands of monitoring checks per minute, run multi-region workers, and deploy multiple times a day — all with Docker Compose and HAProxy . Zero dropped requests. Zero downtime. No etcd to babysit at 3 AM.

But we didn't start with HAProxy. We started with Traefik. That lasted about four hours.

We Tried Traefik First

Traefik is the popular choice for Docker-based setups. It auto-discovers services via Docker labels, has a slick dashboard, and the docs make it look effortless. We set up two backend replicas with Traefik labels, ran a rolling deploy, and watched everything fall apart.

"Service defined multiple times"

Our first deploy strategy was to run a backend_new service alongside the existing backend during the transition. Both had the same Traefik routing labels — same Host rule, same service definition. Makes sense, right? You want both old and new to serve traffic during the cutover.

Traefik disagreed. Its Docker provider treats each Compose service as a separate configuration source. Two services with the same labels? "Service defined multiple times." 404 on every request. No fallback, no merge, just a flat refusal to route anything.

We reworked the approach to use docker compose --scale backend=4 instead of a separate service. That avoided the label conflict. But it uncovered the next problem.

The Scale-Down Race

The rolling deploy strategy: scale up to 4 replicas (2 old + 2 new), then scale back down to 2 (keeping only the new ones). Simple enough.

Except Traefik's internal routing table didn't update fast enough. We'd scale down from 4 to 2, and Traefik would keep routing to containers that were in the process of shutting down. 502s on every other request. The routing state lagged behind Docker's reality by several seconds — long enough to drop a significant chunk of traffic.

We tried adding delays. We tried disconnecting containers from the network before stopping them (so the health check would fail cleanly before removal). We tried passive health checks — added them, then immediately rolled them back because they were too aggressive and caused false positives.

None of it was clean. But the real killer was something else entirely.

The Killer: No Retry on a Different Backend

That's a known issue that devs seem to ignore for a while now... https://github.com/traefik/traefik/issues/2723

Here's the scenario: during a rolling deploy, you stop an old container. docker stop sends SIGTERM. Uvicorn starts its graceful shutdown, but there's a window — requests that are already in-flight, or requests that arrive between the stop signal and Traefik updating its routing table.

When that request hits the dying backend, the connection drops mid-stream. The client gets a raw error — empty response, connection reset, partial body.

We can't have that. When you report your service and heartbeat monitors are up - we need to acknowledge!

Now here's what Traefik does with that failed request: nothing .

Traefik's retry middleware exists, but it retries on the same backend . The one that's dying. The one that will fail again. It doesn't redispatch to a healthy backend. The request is just... lost.

We tried every combination: passive health checks, disconnect-before-stop, retry middleware with different attempts counts. The fundamental problem remained — Traefik couldn't send a failed request to a different server.

That afternoon, we ripped out Traefik and reached for HAProxy.

What You Actually Need

Let's strip it down. What does zero-downtime deployment actually require?

Multiple backend instances — so you can replace one while the other serves traffic

A load balancer that retries on a different backend — so dying containers don't drop requests

A deploy script that replaces instances one at a time — rolling update

That's it. Three things. Let me show you how we do each one.

Step 1: Multiple Replicas with Docker Compose

Docker Compose has a built-in deploy.replicas setting:

# docker-compose.yml

services:<br>backend:<br>build: ./backend<br>deploy:<br>replicas: 2<br>image: myapp-backend<br>expose:<br>- "8000"<br>env_file: .env<br>healthcheck:<br>test: ["CMD", "curl", "-f", "http://localhost:8000/health"]<br>interval: 5s<br>timeout: 5s<br>retries: 3<br>start_period: 5s<br>restart: unless-stopped

That's 2 backend containers running behind a shared Docker DNS name backend. When you resolve backend inside the Docker network, you get both container IPs.

One Dockerfile, one image, two containers. No pod specs, no deployments, no replica sets.

Step 2: HAProxy as the Load Balancer

HAProxy is battle-tested, fast, and the configuration is readable. But the real reason we chose it: option redispatch...

docker traefik backend compose service deploy

Related Articles