Linear elastic caching reduced Spanner's memory use by 15.5%

p_stuart821 pts0 comments

Optimizing cloud economics with linear elastic caching

Skip to main content

Google

Research

Search

Optimizing cloud economics with linear elastic caching

June 25, 2026<br>Todd Lipcon, Distinguished Engineer, Google Cloud, and Manish Purohit, Research Scientist, Google Research

Linear elastic caching minimizes total cache cost by framing page eviction as a ski rental problem, using lightweight machine learning to optimize the trade-off between memory footprint and cache misses.

Quick links

Paper

Share

Copy link

Modern high-performance database systems and cloud services rely on in-memory caching to keep frequently accessed data in RAM to bypass slow disk operations and deliver the lightning-fast response times users expect. But this performance comes with a cost (literally): high-speed memory is expensive, and some serverless cloud providers charge up to $3 per day for just 1 GiB of memory.<br>Historically, cache management has been treated as a fixed-resource problem. With regular, fixed-sized caching, engineers allocate a specific amount of memory for the cache and the system uses eviction policies like least recently used (LRU) replacement to decide which data to keep when that space runs out. This leads to a classic “Goldilocks” problem: size the cache too small and performance plummets; size it too large for peak demand, and you waste thousands of dollars on idle memory.<br>In a paper published at the Conference on Innovative Data Systems Research (CIDR), we introduced linear elastic caching, a new approach designed to minimize the total cost of ownership (TCO) of cache management by dynamically adjusting cache size in response to real-time workloads. Instead of treating memory as a fixed, pre-allocated resource, we treat it as an utility whose cost is linear in both the size of the cached data and the duration for which it is held in the cache. By treating memory footprint as a variable cost that integrates over time, we showed that we can significantly reduce expenses without compromising system performance.

The "ski rental" approach to memory<br>To solve the challenge of dynamic cache sizing, let’s use the classic ski rental problem. Imagine you’re on a ski trip of unknown length. Each day, you face a choice: rent skis for a small daily fee or buy them for a larger upfront cost and ski for free thereafter. If you knew exactly how many days you would ski, the choice would be easy. But without that knowledge, you need an algorithm to minimize your total spend.

Similarly, in a linear elastic cache, every piece of data faces a comparable dilemma. When a piece of data is accessed, the system must decide between two alternatives:<br>"Rent" the space: Keep the data in RAM and pay a continuous cost for the memory it occupies.<br>"Buy" the miss: Evict the data to save memory costs, but risk a "buy" cost (the latency and I/O penalty) if the data is needed again soon.<br>At the same time, the system cannot optimize for each piece of data independently since the cache has a maximum allocated size (think of a large group of people at a ski resort, where the resort only has a limited number of skis to offer). Our core theoretical contribution proves that we can optimize these two factors — the eviction policy and the "rental" duration — separately. This separation lends itself nicely into a clean practical implementation. We can use a ski rental algorithm to determine the time-to-live (TTL) of a page (analogous to the rental duration). If a page isn’t accessed again before its TTL expires, it is automatically evicted. But if the cache ever becomes physically full, a traditional eviction policy like least recently used (LRU) steps in to manage the space.<br>Traditional online algorithm design focuses on providing worst-case performance guarantees. For the ski rental problem, the classic “break-even” algorithm is to rent until the accumulated cost equals the purchase price, and then buying the skis. While this approach (and its randomized counterpart) provide solid worst-case guarantees, production workloads are mostly predictable. Data access in systems like Spanner — our globally distributed database — often follows discernible patterns that can be exploited to make better renting decisions.

Testing linear elastic caching<br>To ensure our theory holds up in the real world, we conducted extensive experiments using two primary sources:<br>Production workloads: We integrated the system into Spanner.<br>Public traces: We tested against a variety of publicly available cache traces from industry benchmarks to ensure the results weren't specific to Google’s infrastructure.<br>Production workloads<br>We developed a practical algorithm that assigns a time-to-live (TTL) to the cached page on each page request based on the page’s access patterns and costs. Because Spanner handles billions of requests per second, this TTL prediction model has to be incredibly lightweight. We opted for a shallow decision tree that can be translated into a few lines...

cache memory data cost linear caching

Related Articles