the mathematics of multi-tenancy - by almog gavra
SubscribeSign in
the mathematics of multi-tenancy<br>why multi-tenancy works for S3, but might not work for you
almog gavra<br>Jun 01, 2026
Share
There was an interesting thread on X the other day where an S3 customer’s bucket migration resulted in a swarm of 503 errors coming from S3. The root cause came down to the way S3 prevents users from explicitly defining partitions. That decision surprised me since, as an infrastructure engineer, I very much like to have control over what subsets of my data are co-partitioned.<br>I started pulling on that thread and the result is a statistical model that helps better understand multi-tenancy.<br>why S3 doesn’t let you control partitioning
S3 documents that a partitioned prefix can handle ~5k GET requests/s, but users don’t have control over the physical partition placement. Instead, S3 learns your workload patterns and then uses statistics to evenly distribute frequently accessed data.<br>I recently re-read Building and operating a pretty big storage system called S3 by Andy Warfield and there’s a section in that post that expertly visualizes why they do this. While individual users’ workloads are spiky, aggregating the workloads of millions of tenants results in a relatively flat and predictable usage pattern:
The catch is that this only works when workload spikes are uncorrelated. If you place all of the data from a particular user on the same machines, you don’t get any amortization. Similarly if you place users with correlated usage patterns on the same machines, you get the opposite effect of what you want. By predicting workload patterns, S3 can place what they believe to be workloads that are uncorrelated on the same disks.<br>In theory this makes a lot of intuitive sense, and I believe this works for S3, but despite working at companies with massive scale (LinkedIn & Confluent) I was never able to get anywhere near this nice of a workload aggregate flattening effect for my systems.<br>modeling workload distribution
To build an intuition for why, I built a statistical model (that’s deployed as a webpage you can play around with) that has a few knobs that affect how the individual workloads are distributed. The key output of the model is the heat ratio H between the peak workload and the average workload:<br>\(\text{H} = \frac{\text{max}(x)}{\bar{x}}\)
There are various inputs to the model, but first we’ll focus on just the population size N (how many distinct workloads are running).<br>an idealized multi-tenant setup
We’ll start with a baseline simulation. In this simulation all workloads are randomly generated with no correlation between them so each contributes roughly equally to the overall. This simulation shows the aggregate workload as you add more individual workloads to the fleet:
This demonstrates exactly what we want to see from a multi-tenant system. As we add more workloads to the pool (N increases) the ratio between peak utilization and average utilization shrinks to almost exactly 1.0.<br>This is great. It means that any individual tenant running on their own would need to provision for peak but a multi-tenant system hardly needs to over-provision.<br>A foundational argument in support of multi-tenancy is that it enables a vendor to provide a service to you at lower cost than if you ran it yourself. In other words, your workload has some value of H that is sufficiently larger than the vendor’s H once you aggregate the entire workload that they can still take margin despite charging you less than the cost of self-hosting. For the sake of this blog, let’s assume that the economics work out when the vendor’s H (their peak workload is no more than twice their average).<br>This graph shows H plotted on the y axis and the number of tenants on the system given the setup:
With the parameters of this simulation, the vendor can cross the modeled breakeven threshold with about 200 tenants running on their system.<br>Thanks for reading bits & pages! Subscribe for free to receive new posts and support my work.
Subscribe
correlation corrodes multi-tenant efficiency
The previous simulation assumed that workloads were completely independent from one another.<br>In practice, few companies have this luxury as there are many factors that might contribute to correlated workloads: if you have customers in the same time zone it’s likely they are more active during business hours or perhaps your customers have seasonality (e.g. big shopping holidays) that cause correlated spikes.<br>We can model correlation by splitting the RNG into two components, a shared one for all workloads and an independent one generated for each. We introduce a new variable ρ to represent the contribution percentage of the shared vs. independent RNGs. When ρ=0 the shared component is 0% of the value, and when ρ=1 the independent component is 0% of the value.<br>It doesn’t matter how many customers you have, if their workloads are correlated you will never get the nice...