Avoid Hasty Caching

jwworth1 pts0 comments

Avoid Hasty Caching | Jake Worth

Approach caching with caution.

In this post, I&rsquo;d like to describe some important considerations that I think<br>teams should consider before applying a caching solution. I&rsquo;ve found:

Caching is often applied too early.

Caching can be a slippery slope.

Caching is hard to do right.

Often Applied Too Early#

Caching is often applied too early. It&rsquo;s wise to try other alternatives first.

Here&rsquo;s an example: suppose we have an order-management system that fetches all<br>the orders, and it&rsquo;s performing slowly. Should we cache it?

We could. But first, here are some good questions.

Is our query optimized on the frontend, API layer, and data layer?

Could we apply a default filter to limit results? Perhaps showing recent<br>orders, rather than all?

Could we be lazy? Implement client-side lazy loading, or server-side<br>pagination?

These are describing the question we are asking. We want that question to be<br>as efficient as possible before we try to optimize how frequently we ask the<br>question.

On many teams, caching is applied before these options are explored.

Can Be a Slippery Slope#

A second challenge is that caching can be a slippery slope: once you cache, it&rsquo;s<br>easy to cache again.

Back to our example, now we&rsquo;ve done some caching, and we&rsquo;re building a page that<br>shows all the customers who use our product. Before we do any benchmarking, what<br>tool is likely to be applied to that query? Caching. Why? Because that&rsquo;s what we<br>use for APIs in this codebase. If it&rsquo;s even a deliberate decision, and many<br>times it isn&rsquo;t via well-meaning copy-pasta, it might not be a reasoned one.

Ideally, when a new API is added, the merits of caching are discussed in the<br>open from first principles. Is the complexity of adding caching to this endpoint<br>correct, right now?

Hard To Do Right#

A third challenge is that caching is hard.

Phil Karlton taught us that cache invalidation is considered one of the two hard<br>problems in computer science. It&rsquo;s additional complexity. Invite complexity into<br>your codebase with caution.

Back to our example app. Consider a security issue. We discover that our<br>customer page leaks sensitive PII. Without caching, we can fix the issue and<br>force a page reload for all clients. With caching, we must ensure that we<br>invalidate every cache, everywhere.

And whoops, we&rsquo;ve gone for it: perhaps we have caching at the data layer, API<br>layer, and frontend. Can you invalidate all those caches while maintaining data<br>integrity and a decent user experience?

Security issues aside, caching can be a vector for sneaky bugs, both to find and<br>reproduce. It&rsquo;s another place where data is stored, another tool to manipulate.

And it can create UX problems when users expect data to be updated more often<br>than it is. Yes, customers want fast applications, but not at the expense of a<br>coherent user experience.

Conclusion#

I feel drawn to this argument because caching is so widespread and accepted. A<br>tool can be &ldquo;blazing-fast&rdquo; and still be wrong for your team, today.

Sometimes, you gotta cache. Make sure that isn&rsquo;t being applied too early, it<br>isn&rsquo;t spreading through your codebase, and you&rsquo;re trying hard to do it right.

Cache away— when you&rsquo;ve exhausted other options and are willing to accept the<br>tradeoffs.

Get new essays in your inbox

No spam. Unsubscribe anytime.

Need hands-on help?

Hire me ↗

caching rsquo cache applied hard data

Related Articles