Why Multigres has its own connection pooler

gregorvand1 pts0 comments

Two jobs, two processes: why Multigres has its own connection pooler | Multigres

Skip to main content<br>Most connection poolers do two jobs in one process: accept client connections, and manage backend connections. In Multigres, we split them. Multigateway accepts clients; multipooler manages backends. This post is about why.

What is Multigres?​

Multigres is Vitess for Postgres. It provides horizontal scaling, connection pooling, and cluster orchestration for PostgreSQL deployments, the same set of problems Vitess has been solving for MySQL for more than a decade.

A Multigres cluster is made of a few coordinated services. Multigateway sits at the front, accepting client connections, speaking the Postgres wire protocol, and routing incoming queries to the right backend. Multipooler sits in front of each Postgres instance and manages the pool of backend connections to it - one multipooler per Postgres instance, colocated. Multiorch runs cluster orchestration: leader election, failover, and health monitoring. Underneath, your data is sharded across multiple Postgres instances, with the cluster topology stored in etcd. The full tour is in the architecture overview.

Within each shard, one Postgres instance is the leader - it accepts writes - while the others run as replicas that serve reads and stand by to take over. Multiorch watches the cluster and, when a leader fails or is being decommissioned, promotes a replica to take its place. Multigateway and multipooler need to know about these transitions as they happen - which instance is the current leader, when leadership is changing hands, when to drain in-flight requests - so they can route traffic correctly and not lose work during a handover. We'll lean on these terms - leader, replica, promotion, drain - throughout the rest of the series.

For this post, what matters is that Multigres ships its own connection pooler - instead of bolting on PgBouncer. That's the part that surprises people. Postgres has PgBouncer. PgBouncer works. Why build another one?

Why not PgBouncer?​

Before we go any further: PgBouncer is excellent. Many people reading this probably operate it every day and have strong, well-earned opinions about it. The reason Multigres has its own pooler is that we want the pooler to be deeply integrated with the rest of the cluster.

A Multigres cluster is a coordinated thing. Postgres instances come and go. Primary and replica roles flip during failover. Backups run on schedule and have to be coordinated with traffic. Multiorch, the orchestration component, needs to drain in-flight requests before promoting a new leader.

A pooler in this world has to participate in cluster coordination. When multiorch decides a Postgres instance should stop accepting writes, the pooler is what closes existing client transactions cleanly and refuses new ones, while multigateway starts buffering these requests. When a backup is in flight, the pooler knows. When a replica is promoted to primary, the pooler knows. When the cluster is doing a graceful shutdown, the pooler is the choke point that decides when "graceful" is actually safe.

PgBouncer is a sidecar. It can be configured, restarted, monitored, but it doesn't speak the cluster's coordination language. We needed something that does.

You might ask: why not contribute these capabilities upstream to PgBouncer? Honestly, the changes we'd need - cluster integration, the gateway/pooler split, full extended-protocol fidelity, plus everything else this series covers - together amount to a fork, not a patch. PgBouncer is excellent at what it sets out to do, and we didn't want to push it into being something it isn't. Building our own is the price of cluster integration.

Why split into two services?​

Once we accept that we're building our own pooler, the next question is: what shape should it have? One process, or two?

Multigres' goal is to bring sharding to Postgres, which makes it a distributed database. And a distributed database changes the connection-layer requirements in two ways.

A single client connection has to reach any Postgres instance in the cluster. When a query arrives, the system might need to send it to one specific Postgres instance, or fan it out across many of them, depending on the data layout. The client doesn't, and shouldn't, know or care which one. That requires a process that owns the client connection and decides where queries go, independently of the Postgres instances themselves.

Aggregation happens at the top. A query that touches multiple shards needs its results combined before they go back to the client. That combine-the-results step is logically separate from any individual Postgres instance's connection pool. It belongs in a layer above all of them.

Both of those concerns naturally live in one process: a thing that accepts client connections, parses queries, decides where they go, and assembles their results. That's multigateway .

What's left is the per-Postgres-instance...

postgres cluster pooler multigres connection instance

Related Articles