How Caching and Optimized Context Works : AIDungeonjump to contentmy subreddits<br>edit subscriptions<br>popular<br>-all<br>-users<br>| AskReddit<br>-pics<br>-funny<br>-movies<br>-gaming<br>-worldnews<br>-news<br>-todayilearned<br>-nottheonion<br>-explainlikeimfive<br>-mildlyinteresting<br>-DIY<br>-videos<br>-OldSchoolCool<br>-TwoXChromosomes<br>-tifu<br>-Music<br>-books<br>-LifeProTips<br>-dataisbeautiful<br>-aww<br>-science<br>-space<br>-Showerthoughts<br>-askscience<br>-Jokes<br>-Art<br>-IAmA<br>-Futurology<br>-sports<br>-UpliftingNews<br>-food<br>-nosleep<br>-creepy<br>-history<br>-gifs<br>-InternetIsBeautiful<br>-GetMotivated<br>-gadgets<br>-announcements<br>-WritingPrompts<br>-philosophy<br>-Documentaries<br>-EarthPorn<br>-photoshopbattles<br>-listentothis<br>-blog
more "
reddit.com AIDungeoncomments
Want to join? Log in or sign up in seconds.
limit my search to r/AIDungeonuse the following search parameters to narrow your results:<br>subreddit:subredditfind submissions in "subreddit"author:usernamefind submissions by "username"site:example.comfind submissions from "example.com"url:textsearch for "text" in urlselftext:textsearch for "text" in self post contentsself:yes (or self:no)include (or exclude) self postsnsfw:yes (or nsfw:no)include (or exclude) results marked as NSFWe.g. subreddit:aww site:imgur.com dog<br>see the search faq for details.
advanced search: by author, subreddit...
this post was submitted on 15 Jun 2026<br>33 points (90% upvoted)<br>shortlink:
Submit a new link
Submit a new text post
AIDungeon<br>joinleavePlay AI Dungeon!
Download the iOS App
Download the Android App
Join Our Discord Server
AI Dungeon Guidebook
For official support, email support@aidungeon.com
a community for 6 years
MODERATORS
message the mods
33<br>How Caching and Optimized Context Works<br>5 · 1 comment
What You Told Us | June Feedback Review<br>14 · 7 comments<br>Why won't my own character shut up?<br>· 1 comment
AI couldn't take it anymore.
WhisperQuest — AI role-playing in your own worlds<br>3 · 5 comments<br>Unconventional use of the AI?<br>6 · 2 comments
Do I understand that wrong? Membership+context<br>The Super Job Interview [ISđźŽ]<br>2 · 2 comments<br>Help with voyage programming<br>Updated my scenario
Welcome to Reddit,<br>the front page of the internet.<br>Become a Redditorand join one of thousands of communities.
×
32<br>33<br>34
How Caching and Optimized Context WorksNew Features (self.AIDungeon)<br>submitted 2 days ago by seaside-rancherVP of Experience - announcement
Two of this year's most exciting additions to AI Dungeon have been the introduction of Cache-Efficient models and the "Optimized Context" setting. When AI models are optimized for caching, they are significantly cheaper to run. Those savings let us give you up to 2x the context length compared to models that aren't optimized for caching, so more of your AI Dungeon or Voyage adventure gets seen and considered by the AI model, preserving important story details and delivering better story continuity.
KV caching (the correct technical term for the LLM caching used for "Optimized Context" on AI Dungeon and Voyage) is a deeply technical concept, and many of you are interested in how it works and how it impacts your experience. We're going to share how it works and clear up some misconceptions we've seen in our community. Let's dive in!
How LLMs Work (a refresher)
While fully explaining how Large Language Models work is beyond the scope of this post, we need to touch on some fundamental concepts of how AI models work. You may find it helpful to explore these concepts on your own if they are new to you.
Every time you take a turn on Voyage or AI Dungeon, the text you input for your turn is combined with other information (like AI Instructions, Plot Components, and Story Cards for AI Dungeon—or state and task information for Voyage) to create the context that gets sent to the AI. The language model performs a series of calculations on the context to generate the output we display in AI Dungeon and Voyage.
Behind the scenes, your input is converted into tokens (numerical representations of word fragments) through a process called tokenization. Then each token is looked up in a giant lookup table using a process called embedding. In embeddings, tokens are assigned vectors (another mathematical representation) that convey all possible meanings of that token.
For example, the word "bank" can mean "a place money is kept" or "a geological feature". The vector captures all of those possibilities. The next phase narrows them down to the one you meant.
The next step is to pass these vectors through the transformer, which works in a series of layers. Here's a useful way to picture it. Think of each token's vector as a block of uncarved granite. Just as a block of stone contains every possible statue, the vector contains every possible meaning of the token. The transformer's job is to carve away everything the token doesn't mean in this particular sentence.
Like a sculptor, it works in passes. The early layers make rough, broad cuts, establishing basic structure—which words are nouns,...