How we rebuilt Postgres branch metrics on VictoriaMetrics | xata
11.7k<br>11.7kLog inGet Started
Back to Blog<br>How we rebuilt PostgreSQL branch metrics on VictoriaMetrics, per cell<br>How we rebuilt Xata's PostgreSQL branch metrics on a self-hosted VictoriaMetrics stack in six weeks, with zero user-visible downtime.
By:<br>Alexis Rico<br>Published:<br>Jun 2, 2026<br>Reading time:<br>6 min read
Back to Blog<br>Tags<br>Xata
Featured<br>Database branching in the age of AIInside Xatastor: ZFS + NVMe-oF for Postgres databasesIntroducing Xata OSS: Postgres platform with branching
Every Xata branch is backed by real PostgreSQL. In the web console, each branch has a metrics view: CPU, memory, connections, disk I/O, network, WAL sync time, replication lag, database size. Support uses it to help customers. We internally read from the same shape of data to make capacity decisions. It has to be fast, cheap, and correct.<br>Over the last six weeks we rebuilt the pipeline behind that view. Metrics now flow through a per-cell stack built around VictoriaMetrics instead of through a single central observability vendor in our control plane. We've done the migration without any visible downtime from a customer’s point of view.<br>This is a long write-up of how we got there and what we changed along the way.<br>The starting picture
Xata is a multi-region service. In each of the regions, we have a "multi-cell" architecture, meaning multiple copies of our stack per region, depending on the scale.<br>Every cell in every region sent the collected metrics via OpenTelemetry to a hosted third-party observability storage. Inside one of our services, an HTTP client talked to the vendor's query API and shaped the response for the console.<br>It worked for a year. It was a good fit for a smaller fleet. Then we kept growing, and three things stopped lining up.<br>Why we moved<br>Scale<br>Branch metrics fanned in. Every Postgres pod in every region eventually sent data to the same external observability backend. That setup worked, but only up to the point where adding a new metric meant worrying first about how many new time series its labels would create.<br>We needed observability that lived next to the data, and a model that made bad labels expensive to write instead of expensive to query.<br>Performance<br>The metrics view in the console was slower than it had any reason to be. The page made one HTTP request per chart on every load, sequentially, because that's how the frontend codebase had grown over time. Each request went through a separate parse and plan on the vendor's API side.<br>A handful of small queries for one branch over one time window, where one query plan with a fan-out would have done. Additionally, our cloud observability vendor was starting to lag under the weight of everything being consolidated into a single store.<br>Cost<br>The bill from the vendor was growing on a curve that tracked branch creation. The cardinality story above was the biggest single driver.<br>We fixed it in our internal telemetry pipeline, and the projected drop on the residual surface is significant. But the lesson was the bigger takeaway. With one central store mediating every query, every accidental label becomes everyone's problem, and the only feedback channel is an invoice.<br>Why VictoriaMetrics<br>We did not go shopping for a new platform. We had a metrics product surface we needed to fix, and we wrote down what we needed before we picked tools.<br>PromQL native. Our engineers already speak it. Our internal tools already know it. Adding a second query language for one product surface was a no.<br>Cheap per active series. The cardinality bruise we just described made us label paranoid. VictoriaMetrics has a deserved reputation for being sparing with both memory and disk per active series.<br>One binary per role. vmsingle, vmagent, and vlogs are each a statically linked Go binary with a small config surface. The platform team is small, and we don't like running things we can't reason about end to end.<br>Per-cell deployable. Every Xata cell is already a self-contained Kubernetes cluster that has everything a customer's branch needs. Observability should live there too.<br>VictoriaMetrics fit each of those. We also rolled VictoriaLogs and Vector into the same chart, sized for the same per-cell footprint, because the foundation we wanted to build was bigger than the metrics view of today.<br>The new shape
Two things to notice.<br>The store is per cell. Each cell runs its own vmsingle, its own vmagent, its own Vector DaemonSet. Nothing talks across cells. Cross-region OTLP is gone. One cell's observability can't take down another's.<br>The data plane owns the API. Reading a branch's metrics is now a gRPC call to the service that already manages CNPG inside the cell. The control plane no longer carries a hand-rolled HTTP client for a vendor query API. It makes one RPC, and the cell speaks PromQL to vmsingle locally.<br>The new client code is deliberately boring. VictoriaMetrics is PromQL verbatim, so there's nothing to translate. A thin...