AI Dungeons: How Caching and Optimized Context Works

How Caching and Optimized Context Works : AIDungeonjump to contentmy subreddits edit subscriptions popular -all -users | AskReddit -pics -funny -movies -gaming -worldnews -news -todayilearned -nottheonion -explainlikeimfive -mildlyinteresting -DIY -videos -OldSchoolCool -TwoXChromosomes -tifu -Music -books -LifeProTips -dataisbeautiful -aww -science -space -Showerthoughts -askscience -Jokes -Art -IAmA -Futurology -sports -UpliftingNews -food -nosleep -creepy -history -gifs -InternetIsBeautiful -GetMotivated -gadgets -announcements -WritingPrompts -philosophy -Documentaries -EarthPorn -photoshopbattles -listentothis -blog

more "

reddit.com AIDungeoncomments

Want to join? Log in or sign up in seconds.

limit my search to r/AIDungeonuse the following search parameters to narrow your results: subreddit:subredditfind submissions in "subreddit"author:usernamefind submissions by "username"site:example.comfind submissions from "example.com"url:textsearch for "text" in urlselftext:textsearch for "text" in self post contentsself:yes (or self:no)include (or exclude) self postsnsfw:yes (or nsfw:no)include (or exclude) results marked as NSFWe.g. subreddit:aww site:imgur.com dog see the search faq for details.

advanced search: by author, subreddit...

this post was submitted on 15 Jun 2026 33 points (90% upvoted) shortlink:

Submit a new link

Submit a new text post

AIDungeon joinleavePlay AI Dungeon!

Download the iOS App

Download the Android App

Join Our Discord Server

AI Dungeon Guidebook

For official support, email support@aidungeon.com

a community for 6 years

MODERATORS

message the mods

33 How Caching and Optimized Context Works 5 · 1 comment

What You Told Us | June Feedback Review 14 · 7 comments Why won't my own character shut up? · 1 comment

AI couldn't take it anymore.

WhisperQuest — AI role-playing in your own worlds 3 · 5 comments Unconventional use of the AI? 6 · 2 comments

Do I understand that wrong? Membership+context The Super Job Interview [IS🎭] 2 · 2 comments Help with voyage programming Updated my scenario

Welcome to Reddit, the front page of the internet. Become a Redditorand join one of thousands of communities.

32 33 34

How Caching and Optimized Context WorksNew Features (self.AIDungeon) submitted 2 days ago by seaside-rancherVP of Experience - announcement

Two of this year's most exciting additions to AI Dungeon have been the introduction of Cache-Efficient models and the "Optimized Context" setting. When AI models are optimized for caching, they are significantly cheaper to run. Those savings let us give you up to 2x the context length compared to models that aren't optimized for caching, so more of your AI Dungeon or Voyage adventure gets seen and considered by the AI model, preserving important story details and delivering better story continuity.

KV caching (the correct technical term for the LLM caching used for "Optimized Context" on AI Dungeon and Voyage) is a deeply technical concept, and many of you are interested in how it works and how it impacts your experience. We're going to share how it works and clear up some misconceptions we've seen in our community. Let's dive in!

How LLMs Work (a refresher)

While fully explaining how Large Language Models work is beyond the scope of this post, we need to touch on some fundamental concepts of how AI models work. You may find it helpful to explore these concepts on your own if they are new to you.

Every time you take a turn on Voyage or AI Dungeon, the text you input for your turn is combined with other information (like AI Instructions, Plot Components, and Story Cards for AI Dungeon—or state and task information for Voyage) to create the context that gets sent to the AI. The language model performs a series of calculations on the context to generate the output we display in AI Dungeon and Voyage.

Behind the scenes, your input is converted into tokens (numerical representations of word fragments) through a process called tokenization. Then each token is looked up in a giant lookup table using a process called embedding. In embeddings, tokens are assigned vectors (another mathematical representation) that convey all possible meanings of that token.

For example, the word "bank" can mean "a place money is kept" or "a geological feature". The vector captures all of those possibilities. The next phase narrows them down to the one you meant.

The next step is to pass these vectors through the transformer, which works in a series of layers. Here's a useful way to picture it. Think of each token's vector as a block of uncarved granite. Just as a block of stone contains every possible statue, the vector contains every possible meaning of the token. The transformer's job is to carve away everything the token doesn't mean in this particular sentence.

Like a sculptor, it works in passes. The early layers make rough, broad cuts, establishing basic structure—which words are nouns,...

AI Dungeons: How Caching and Optimized Context Works

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi