Federating Clusters for Zero-Downtime Kubernetes | Linkerd
Announcing Linkerd 2.20: Rate-limit-aware load balancing, reduced memory usage, and better metrics<br>Learn more<br>Federating Clusters for Zero-Downtime Kubernetes
Dominik Táskai, Linkerd Ambassador<br>Jun 24, 2026 • 15 min read
Every multi-region setup eventually meets the same awkward moment: a whole<br>cluster goes away, and the identical copy of your service running two regions<br>over might as well not exist, because nothing is wired to treat them as one<br>thing. Failover becomes a runbook: restore, repoint DNS, and wait for an outage<br>that, on paper, you’d already paid to survive.<br>Linkerd’s multicluster extension closes that gap by letting several clusters<br>present a service as a single, load-balanced endpoint. The part that the<br>official tasks gloss over is that a real platform almost never picks one<br>multicluster mode. Some services want federation (same service everywhere, one<br>endpoint, automatic failover). While others want mirroring (reach a specific<br>remote service by name). And you frequently want both patterns living on the<br>same set of links. The docs walk through each mode on its own. This post wires<br>all three together across three GKE clusters, with a full-mesh link topology, a<br>chaos test that takes out an entire cluster, and scripts you can clone and run<br>on a fresh GCP project.<br>Companion repo : Every script referenced here lives in<br>this repository. Feel<br>free to clone it, set your project ID, and run it.<br>Linkerd multicluster modes: Gateway, flat, and federated<br>Linkerd’s multicluster extension supports three modes. The nice thing is they’re<br>not mutually exclusive: on the same set of linked clusters, the mode is chosen<br>per service via a label.<br>ModeLabelWhat happensNetwork RequirementHierarchical (gateway) mirror.linkerd.io/exported=trueService mirrored as -, traffic routed through a gatewayGateway IP reachableFlat (pod-to-pod) mirror.linkerd.io/exported=remote-discoveryService mirrored as -, traffic goes directly to remote podsFlat network (pod IPs routable)Federated mirror.linkerd.io/federated=memberAll same-name services unioned into -federated, load balanced across all clustersFlat network (pod IPs routable)The distinction that matters operationally is that hierarchical mirroring works<br>on any network. Only the gateway IP needs to be reachable, while flat and<br>federated modes need real pod-to-pod connectivity. On GCP, VPC-native GKE<br>clusters on peered VPCs give you that flat network for free. So, you can run<br>federated services for your core workloads over a flat network and still mirror<br>a specialized service through a gateway from a cluster that isn’t on that<br>network. Most platform teams I’ve seen end up with exactly this kind of mix.<br>Multi-region architecture: GKE cluster setup<br>We have three GKE clusters across three regions, fully linked to each other (six<br>directional links total). Three demo services, each using a different<br>multicluster mode:<br>frontend is federated and runs in all three clusters. A single federated<br>frontend service in each cluster load-balances across all nine pods (3 replicas<br>× 3 clusters). When a cluster goes down, the remaining six pods absorb the<br>traffic with no application changes.<br>api is flat-mirrored and runs in west and east. The north cluster<br>consumes it as api-west and api-east, which are explicit remote service<br>names with traffic sent straight to the remote pods. This is what you reach for<br>when the client needs to decide which backend it talks to, for example, to keep<br>a request in-region for data locality.<br>analytics is gateway-mirrored and runs only in east. Exported through the<br>Linkerd gateway so west and north reach it as analytics-east-gw without<br>needing flat-network connectivity to east’s pods. It’s here mainly to prove<br>that gateway mode coexists with flat and federated modes on the same links.<br>Deployment prerequisites: GKE, Linkerd, and CLI tools<br>A GCP account (free-tier credits cover this. Use three standard clusters with<br>small node pools)<br>gcloud CLI, authenticated (gcloud auth login)<br>kubectl v1.28+<br>step CLI, brew install step (for certificate generation)<br>helm v3<br>~30 minutes for the full setup<br>The infra script enables the compute and container APIs for you, so a<br>brand-new project works out of the box.<br>Step 0: Configure<br>Clone the repo, create a local .env file from the example file, and customize<br>it for your GCP project. The defaults are enough for the rest of the demo, so in<br>most cases you only need to change the project ID.<br>git clone<br>cd blog-linkerd-federation<br>cp env.example .env
Open .env and set at least your project ID. The file ships with sensible<br>defaults for everything else:<br>export GCP_PROJECT="your-project-id"
export REGION_WEST="us-central1"<br>export REGION_EAST="us-east1"<br>export REGION_NORTH="europe-west1"
# One zone per region. We pin node-locations to a single zone so num-nodes is<br># the TOTAL node count — see the cost note below for why this matters.<br>export...