We went multi-region then undid it | Autumn
DocsBlogPricingDiscord
Start for free
Back to blog<br>Last year we started to onboard companies with a global customer base. With our own users starting to appear in more regions, we decided to build a multi-region architecture to reduce latency times globally.
Initially, our services were isolated to one region, us-west.
Our aim was to reduce latency in two regions to start, us-west and us-east, and targeted a round trip latency of under 50ms. The main difficulty was that this applied to both reads and writes, so simply using DB read replicas weren’t an option. Ultimately, there were two major considerations:
How to spin up our server in multiple regions
More crucially though, how to make data reads and writes low latency across regions
Spinning up our server in multiple regions
There were two options here. Either we went serverless with something like Cloudflare Workers, or we manually spun up stateful servers in different regions. We went with the latter for a couple reasons:
The whole point of this was to reduce latency. With serverless, we were afraid of inconsistent latencies due to cold startup times, which we benchmarked and proved to be true.
Our server was already stateful, and going serverless would’ve broken patterns we relied on. Event batching, for one, gets painful when every request runs in an isolated session.
This blog from Unkey was really helpful when we made our decision. Now our next challenge was deciding on a provider. Our requirements were simple:
Latency should be as low as possible
Spinning up multi-region servers should be as simple as possible
Surprisingly, we tried almost every provider we could find and none of them fit perfectly. We ultimately chose AWS ECS, managed through Flightcontrol, where we spun up an ECS service in us-west and us-east, then used Route53 to route requests based on region.
To explain why we came to this decision, it’s worth walking through the other top contenders.
Render
We were originally on Render so this seemed like the obvious choice. However, Render doesn’t natively support multi-region, so to set this up we had to manually create instances in each region. More annoyingly though, the only way to have a single domain route to different instances was to use Cloudflare’s load balancer.
Ultimately, we chose AWS over Render because we found that Cloudflare's Load Balancer introduced additional latency compared to Route53, which resolved at the DNS layer. With Render, there were also multiple hops involved as Render itself uses Cloudflare in front of their services.
Railway
Railway was extremely compelling because they supported multi-region natively. That meant that you could spin up a single service, have it replicated across different regions, and they would handle load balancing, provisioning, and more for you. The DX was unmatched. Unfortunately though, Railway’s infra isn’t on AWS. They build their own machines. This means a couple things:
Our database, cache, and other data stores wouldn’t be co-located with our server, unless we used Railway for those as well, which was too limiting for us
Most of our users were also hosted on AWS so their servers wouldn’t be as close to ours
Ultimately, with both providers, the decision came down to latency. AWS consistently provided the lowest latencies in our benchmarks.
That said, ECS came with a bunch of maintenance overhead, especially coming from Render. Even with Flightcontrol, we had to build an internal dashboard to build and deploy across regions at once. Moreover, application and load balancer logs were an absolute pain to set up. But today I’m very glad we made the tradeoff. Having lower-level control over our infra has been useful, and AI has made things much easier too.
Making data reads and writes multi-region
The bigger challenge we faced was with data access: making both reads and writes fast across regions. Think of us as a complex rate limiter. Before a request is allowed through, we often need to update usage counters atomically and decide whether the customer still has access.
For example, when you send a message to Cursor, they may deduct an estimated number of credits before accepting your message, then reconcile the actual usage afterwards. Since these writes sit on the hot path, they need to be real-time and fast. We considered several approaches to solving this.
A master database per region
We’d spin up a Postgres database in each region, completely isolated from each other, and let our users pick which region their data lives in, so it sits closest to their server. The catch, beyond running multiple databases, is that our user’s own customers might be spread across regions. For example, if they’re running Cloudflare Workers, pinning a whole account to one region doesn’t hold up.
A region per customer
Instead of pinning our user, we could pin a customer: our user’s user. Each customer is tied to a region, and all...