The next generation of Metal; Railway Metal Gen 2
Mark Imbriaco<br>Jun 2, 2026<br>If you've deployed something on Railway in the last few weeks, there's a non-trivial chance your build, your stacker, or your storage is sitting on hardware that didn't exist on our network six months ago.<br>We've been cutting over a new generation of Metal underneath the platform.<br>5x more compute capacity<br>5x more network throughput<br>And two more nines of capacity stability, meaning less hot spotting for demanding workloads.<br>We onboarded more compute in Q1 2026 than in all of 2025.<br>This is what it is, why we did it, and where it's landed so far.<br>Getting to the Metal
For many on Railway, people are aware that we run our own hosts on our own hardware. We’ve been doing this since 2024, at scale in 2025, and it’s a practice we plan to continue.<br>In the beginning, Railway was on a single GCP region in 2020, then 4 by 2023. But, as our founder Jake Cooper likes to remind people, you can’t build a cloud on another cloud. We were losing $20 for every dollar that came in.<br>We care about being a fundamentally good business so that we can avoid the mistake of other PaaSes, but also, help deliver the best experience for our customers.<br>Our Gen 1 hosts were dual-socket Intel Xeons with terabytes of DDR4 and modest CPU. Also, let’s be honest, Charith and Christian ordered these from a couch at our 2024 Mallorca retreat based on some napkin math. We didn’t know how they’d perform until we ran them in production. This was before I joined Railway, but I can only salute the commitment to forward progress.<br>In 2024 the typical Railway service was a Node webapp or a small Python service that wanted lots of RAM and not much else.<br>Then 2025 happened … and we had three new issues to deal with.<br>More serious workloads<br>Think vector databases, scrapers, agentic loops, and inference proxies. The median Railway service now wants a real fraction of a CPU. Our internal CPU:RAM demand ratio shifted from 1:20 toward 1:8. The Xeons were not built for that, and their AVX-512 path was nerfed (read: only two AVX compute units for the entire machine) badly enough that LLM workloads in particular were a non-starter on Gen 1. We apologize to anyone who wanted to run Llama 3.<br>The memory market<br>From October 2025 to January 2026, RAM prices roughly tripled. Vendors stopped honoring orders. The compute we had on the floor was suddenly worth a lot more than what it cost to put there, and the compute we still wanted to buy was either unavailable or three times the unit price we'd planned for.<br>Demand curve going parabolic<br>Then with agents, in Q1 2026 we onboarded 4x more demand in one quarter than we did in the entirety of 2025. To put that in perspective, a fully loaded new Gen 1 site only buys us about three months of capacity runway.<br>Just three months. We were going to run out before we could build the next one.<br>So we redesigned what a Railway site is, and started designing the next generation of Metal.<br>Where to put the box
Around June 2025, after the success of our Gen 1 Railway Metal deployment, and the pricing-cut we delivered to users, we were confident that we would be able to fully book further capacity. So, much like building a custom gaming rig that would run Unreal at max specs, Charith went shopping.<br>The logistics of this are challenging. Supply chains for all of the components of a modern server are stretched these days, meaning prices go up and delivery schedules get murky. And once the servers are ready to be delivered, you need to have enough space and power in your datacenters to install and run them.<br>Our Gen 1 sites were tapped out so we worked to secure 4 new sites, near our existing Metal regions, with multiple of the power and space of our original sites. We were around a year into our operational maturity for our sites so we had plenty of learnings to incorporate into our on datacenter design and operational processes.<br>Gen 2: Stronger, Faster, Denser
At the end of 2025, we knew that demand was growing but even we didn’t fully appreciate how rapidly the curve would become a vertical line. That said, we made the decision to focus on right-sizing density for our customers to reduce the risk of noisy neighbors, provide maximum IOPS and I/O for our customers, and radically expand the networking throughput of our machines.<br>(All based off of feedback from our Enterprise customers)<br>We don’t usually publish specs but we moved to the latest generation AMD Zen 5c EYPC CPUs with 96 cores (192 threads) with DDR5, 5x more storage than Gen 1, and dual 100G ConnectX-6 NICs. All in the same chassis as our Gen 1 storage server, so we get to consolidate from four SKUs down to two. Having run both Supermicros X13 and H13 platforms for over a year, we were privy to the failure rates and firmware quirks of both platforms - the H13s were rock solid and far more efficient, so we doubled down. We learnt the hard way to stay a generation behind on platform after our then brand new...