Incident Report: May 19, 2026 - GCP Account Suspension
Chandrika Khanduri & Cody De Arkland<br>May 20, 2026<br>🚅<br>This report reflects what we know at time of publication and may be updated pending Google Cloud's internal review.
Railway experienced a platform-wide service disruption due to Google Cloud incorrectly placing our account in a suspended status. This resulted in a temporary loss of service for all GCP hosted infrastructure. This infrastructure supports our dashboard, API, and pieces of our network infrastructure. As cached network routes expired, the outage extended beyond GCP to affect all Railway workloads.<br>Below, we walk through what happened, how we responded, and what we're doing to prevent a similar incident in the future.<br>Impact
On May 19, 2026 between 22:20 UTC and approximately 06:14 UTC on May 20 (~8 hours), Railway experienced a platform-wide outage after Google Cloud suspended services on our production account. This took our API, control plane and databases offline, along with compute infrastructure hosted on Google Cloud.<br>Users immediately experienced 503 errors on the dashboard and API, including "no healthy upstream" and "unconditional drop overload" messages, and were unable to log in. All workloads hosted on Google Cloud compute were taken offline.<br>While workloads on our own Railway Metal and AWS burst-cloud environments remained up, Railway's edge proxies rely on a Google Cloud-hosted control plane API to populate their routing tables, causing the outage to cascade beyond Google Cloud. As the route caches expired, these other workloads became unreachable, resulting in returning 404 errors as the network control plane could no longer resolve routes to active instances. At peak impact, all Railway workloads across all regions were rendered unreachable.<br>As we recovered our Google Cloud environment, builds and deployments were blocked platform-wide while we restored the individual services. Once the entirety of our infrastructure was restored, a significant backlog of queued deploys was gradually drained to avoid overwhelming the platform. In parallel, GitHub began rate-limiting Railway's OAuth and webhook integrations, temporarily blocking logins and builds. The volume of these calls increased as a result of our caches being cleared from the Google Cloud outage. As a side effect, Terms-of-service acceptance records were also reset, prompting users to re-accept on their next visit to the dashboard.<br>We take full responsibility for the architectural decisions that allowed a single upstream provider action to cascade into a platform-wide outage, and detail below what happened, how we recovered, and the changes we are making to prevent this from happening again.<br>Incident Timeline
May 19, 22:10 UTC - Our automated monitoring detected API health check failures and paged our on-calls, who started investigating the issue.<br>May 19, 22:11 UTC - Dashboard returning 503 errors. Users unable to log in.<br>May 19, 22:19 UTC - Root cause identified: Google Cloud Platform has suspended Railway's production account.<br>May 19, 22:22 UTC - P0 ticket filed with Google Cloud. Railway's GCP account manager engaged directly.<br>May 19, 22:29 UTC - Incident declared.<br>May 19, 22:29 UTC - GCP account access restored. All compute instances remained stopped and persistent disks inaccessible.<br>May 19, 22:35 UTC - Cached network routes began expiring; workloads on Railway Metal and AWS began returning 404 errors as the networking could no longer resolve routes.<br>May 19, 23:09 UTC - First persistent disk comes back online.<br>May 19, 23:54 UTC - All persistent disks restored to ready state. Network still down.<br>May 20, 00:39 UTC - Disks confirmed ready. Recovery blocked on Google Cloud networking restoration.<br>May 20, 01:30 UTC - Compute instances began recovering.<br>May 20, 01:38 UTC - Edge traffic being served again. Networking restored.<br>May 20, 01:57 UTC - Orchestration and build infrastructure restored. Deploys temporarily paused to prevent overwhelming systems as queued work attempted to execute simultaneously.<br>May 20, 02:04 UTC - Compute hosts being brought back online incrementally.<br>May 20, 02:47 UTC - GitHub began rate-limiting Railway's OAuth and webhook integrations; some users unable to log in, builds blocked.<br>May 20, 02:55 UTC - Dashboard accessible again.<br>May 20, 03:59 UTC - Deployments beginning to process again across all tiers.<br>May 20, 04:00 UTC - API, dashboard, and OAuth endpoints confirmed operational. Remaining workloads continuing to restore.<br>May 20, 06:14 UTC - Incident moved to monitoring.<br>May 20, 07:58 UTC - Incident is resolved.<br>What Happened?
At 22:20 UTC on May 19, Google Cloud placed Railway’s production account into a suspended status incorrectly, as part of an automated action. This action extended to many accounts within Google Cloud. As this was a platform-wide action, there was no proactive outreach to individual customers prior to the restriction.<br>This suspended status disabled our GCP related...