PostgresBench: A Reproducible Benchmark for Postgres

PostgresBench: A Reproducible Benchmark for Postgres Services Open searchOpen region selectorEnglish Japanese

47.8kSign inGet started

->Scroll to top

BackBlog Engineering Copy pageCopied!More actionsView as Markdown Open this page in Markdown Open in ChatGPT Ask questions about this page Open in Claude Ask questions about this page Open in v0 Ask questions about this page

PostgresBench: A Reproducible Benchmark for Postgres Services

Lionel Palacin Apr 2, 2026 · 11 minutes read

For years, we have focused on building fast systems. ClickHouse is an example of that focus. Performance is not a feature we add later. It is a core design goal from the start.

We applied a similar approach when building our managed Postgres service. The result is offering one of the fastest managed Postgres services to our customers. Postgres handles transactional workloads, while ClickHouse handles analytical workloads. Together they form a unified data stack enabling a "best-of-breed" foundation SaaS and AI applications.

With that in mind, it felt natural to evaluate it the same way we evaluate ClickHouse: with a public, reproducible benchmark.

That is why we built PostgresBench, a benchmark to compare managed Postgres services.

From ClickBench to PostgresBench #

ClickBench is a widely referenced OLAP benchmark. It benchmarks more than 40 databases using a transparent and reproducible methodology. All queries, datasets, and results are public and anyone can validate the numbers or submit improvements.

PostgresBench applies the same methodology to transactional Postgres workloads. The rules are straightforward:

Use a well-understood, standard workload

Keep infrastructure consistent across all services tested

Publish all configuration so results can be reproduced

Allow anyone to submit results or flag issues

If a number looks wrong, it can be checked. If a configuration is unfair, it can be fixed. That is the point.

Benchmark design #

Workload #

PostgresBench is built on pgbench, the standard Postgres benchmarking tool. We use the TPC-B-like workload it includes out of the box, which simulates short concurrent transactions with frequent writes and updates. It is a reasonable proxy for common transactional patterns: payments, order processing, inventory updates, and similar workloads that hit the database hard with small, frequent writes.

We chose pgbench deliberately. Tools like sysbench and Percona TPCC are designed originally for MySQL workloads. For a Postgres benchmark, pgbench feels more natural, and it ships with Postgres, which makes it easy for anyone to reproduce results without additional tooling.

Running parameters #

Each benchmark run uses the following parameters:

pgbench -c 256 -j 16 -T 600 -M prepared -P 30 \ -s $SCALE_FACTOR \ -h $PGHOST -p $PGPORT -U $PGUSER -d $PGDATABASE

We ran each benchmark with 256 clients and 16 threads, which reflects realistic concurrency for a production transactional workload. Each run lasts 10 minutes, long enough to move past warmup and capture stable throughput.

We tested two scale factors: 6849 (~100 GB) and 34247 (~500 GB). These correspond to dataset sizes typical of real Postgres deployments: one where the app is getting started, growing quickly and working set reasonably fits in cache and the other that has achieved reasonable scale, is growing and working set starts spilling to disk The gap between results at these two sizes tells you something useful about how a service handles storage pressure as data grows.

Metrics captured #

We report average TPS, average latency, P95 latency, and P99 latency across all three runs per configuration. We publish the ranking for the best and worst run, and the details of each individual run are available in the repository.

Fairness #

No benchmark is perfectly neutral. Every choice, from instance type to Postgres configuration, can favor one system over another. We explain our thinking behind each decision below, and document the exact settings used for each system in the benchmark repository alongside the results.

Client machine setup #

We provisioned a 16 vCPU, 64 GB instance in the us-east-2 region to run the benchmark client, sized to ensure the client is never the bottleneck. All services were tested in the same region, so results reflect only database performance, not cross-region network latency. We also do not colocate client and database by availability zone, since not all services offer this capability. However, to ensure fairness for those that do, this is something we may consider adding in the future. Contributions are welcome.

Instance selection #

For most services, we targeted a 1:4 CPU-to-RAM ratio and tested two sizes: 4 vCPUs/16GB RAM and 16 vCPUS/64GB RAM. Aurora does not offer an instance class that provides this ratio so we used a 1:8 ratio at two sizes as well: 4 vCPUs/32 GB RAM and 16 vCPUS/128 GB RAM.

We used Graviton instances with NVMe caching for all services that support them,...

PostgresBench: A Reproducible Benchmark for Postgres

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy