The AWS Service Quotas That Will Take Down Your Production at 3 AM (And You Cannot Raise Them Fast Enough) | by Illya Yalovoy | May, 2026 | MediumSitemapOpen in appSign up<br>Sign in
Medium Logo
Get app<br>Write
Search
Sign up<br>Sign in
Member-only story
The AWS Service Quotas That Will Take Down Your Production at 3 AM (And You Cannot Raise Them Fast Enough)
Illya Yalovoy
13 min read·<br>Just now
Listen
Share
Hard limits, scaling lags, and the architectural walls that no support ticket can fix.<br>Your pager fires at 3:12 AM. Users are getting 503s. You check the dashboard — Lambda is throttling at 1,000 concurrent executions. You open a quota increase request and see the estimated response time: 1–3 business days. It is Saturday. This is the moment you learn the difference between a soft limit and a hard wall.<br>The Two Categories of AWS Limits That Matter<br>Every AWS account ships with two kinds of limits, and the difference between them is the difference between a minor inconvenience and a 3 AM architecture redesign.<br>The first kind is adjustable quotas. Lambda concurrent executions (default 1,000 per region), EC2 on-demand vCPUs per instance family, API Gateway requests per second. You can raise these through the Service Quotas console or a support ticket. This sounds reassuring until you learn the timeline: most increases take 1–3 business days. Some, like GPU instances or dedicated hosts, take weeks. If your traffic spike is happening right now, a support ticket is not a solution. It is a post-mortem action item.<br>The second kind is hard limits. NAT Gateway caps at 55,000 simultaneous connections per destination. S3 gives you 5,500 GET and 3,500 PUT requests per second per prefix. DynamoDB on-demand tables will…
Written by Illya Yalovoy<br>5 followers<br>·8 following
Help
Status
About
Careers
Press
Blog
Privacy
Rules
Terms
Text to speech