How to build a 30M RPS CDN in 30 days with Rust and WASM
Phin Walton<br>Jun 4, 2026<br>Railway has technically ran an “edge” network for over 3 years, although it didn’t really live up to the “edge” name - it ran on our 4 Railway regions across the world, nowhere near 100+ POPs you expect when thinking about edge computing.<br>However, starting this week, the CDN we built (which served you this very page!) is becoming available to all Railway customers at the click of a button.<br>This is the story of how we built a modern CDN: why we had to build it ourselves, how BGP anycast gets messy at global scale, and how we leverage WASM to ship dataplane changes without dropping a single packet.<br>💡<br>Note from the author: The 30M RPS in the title is real - we've absorbed DDoS attacks at that rate with zero service disruption. However, day-to-day, the network serves ~1M RPS at peak traffic. Benchmarked end-to-end under ideal conditions, it tops out around 150M RPS.
We had to build this ourselves
You may be reading this, asking yourself why we’d go through all the effort of building out such a large system that we could just buy off the shelf from another provider. The reality is, we initially did just that and tried to integrate it with our product, but quickly realized a few things:<br>Our internal development velocity outpaces the rate at which these legacy platforms can iterate<br>A lot of our users have unique, non-standard request & connection behaviors which we’ve supported for years, which other providers are not engineered to support<br>But the thing I actually care about is what owning the stack lets us build into the product. If you look at our support threads, network related issues made up around 20 percent of all ticket volume. CDN configuration and nameservers and the like.<br>During the initial scoping of what a CDN may look like if we built it ourself, we had a realization:<br>We realized we weren’t just building a CDN
We’re a cloud computing platform, not just a CDN - and that unlocks some unique benefits that traditional CDNs find very hard to execute correctly;<br>We operate both the edge AND the origins - which means we can send traffic through the internet and backbone links directly to your workload, rather than trying to guess where the origin is located. Traditional CDNs find this very hard, and the datapoints they use to try to make this guess are very muddy - for example, even if an “origin” is 10ms away on layer 3, the origin may proxy an asset that lives 200ms away. Luckily for us, we already know exactly where your app runs - to the longitude, latitude and altitude. This give us a unique routing superpower.<br>We can offer performant defaults that other CDNs are too scared to offer - for example, did you know that Cloudflare doesn’t cache HTML or JSON by default, even if your website’s Cache-Control header says to do so?<br>I believe this is because they’re too scared of confusing the developer: the developer expects website updates to appear when they refresh, and Cloudflare doesn’t know when you make updates to your website. However, with the Railway CDN, we respect HTML cache control and automatically purge HTML cache when you deploy - best of both worlds!<br>How we built and deployed Hikari in 30 days
Sorry, I forgot to mention - we call this new edge network and CDN Hikari, the Japanese word for “light” or “fiber optics”, also the name of the second-fastest Tokaido Shinkansen (bullet train), and that one Blue Archive train conductor.<br>Since I dive into internals, it’s easier if I use the internal name for the blog.<br>I won’t get too much into the specifics of the server configuration or datacenter & provider lease agreements - that could be an entire post itself! But, to give you the rundown, we started procuring these a few months before we started building Hikari, and now have 60 POPs (*we are still waiting for the delivery of ~40% of these locations), over 180 CDN nodes and tens of terabits of network capacity. Each node is a 16 core EPYC with 256G of memory, 8TB of NVME storage and 100G networking.
Our CDN POPs (so far!) as displayed by our internal DCIM tooling
Engineering in reverse
Usually when building a huge project like this, you think about the implementation, system and business logic first, and deployment later. But, with over 180 nodes, handling ~1 million RPS at peak hours, we knew we had to design this differently. To put it straight, we have been scared to perform upgrades to our existing origin proxies because of the blast radius - we use Ansible and have been burned in the past by the lossy, brittle, hard to audit behaviors that Ansible exposes - it’s also a slow tool to work with when working with hundreds of servers.<br>When scoping the initial project work with the team in Kyoto, we aligned on the fact that the entire deployment experience had to be first class. We need to be able to quickly iterate on the product at Railway speed, without the anxiety of potentially impacting 1M RPS at...