PEEK: Give Your Agent an Orientation Cache (MIT CSAIL, Khattab group)

galsapir1 pts0 comments

PEEK: Give Your Agent an Orientation Cache - Zhuohan’s Homepage You are using an outdated browser. Please upgrade your browser to improve your experience.

PEEK: Give Your Agent an Orientation Cache<br>We introduce PEEK , a system that caches reusable orientation knowledge about a recurring external context as a small, prompt-resident context map .<br>12 minute read<br>Paper: https://arxiv.org/abs/2605.19932<br>Code: github.com/zhuohangu/peek<br>tl;dr<br>LLM agents such as Claude Code , Codex , RLM , and Hermes Agent increasingly operate over long and recurring external contexts: document corpora, code repositories, and other resources that the agent queries again and again but live outside the LLM’s context window. This capability is now referred to as Grounded Reasoning . Existing approaches preserve the agent’s trajectory, passive access to raw materials, or task-level strategies. None of them preserves what we argue is most needed for repeated same-context workloads: reusable knowledge about the context itself.<br>PEEK fills this gap by caching a context map : a compact, constant-sized artifact that sits inside the agent’s prompt and stores what the agent has learned about the external context itself across interactions. The map is maintained by a programmable cache policy with three modules: a Distiller that extracts transferable knowledge from execution trajectories, a Cartographer that translates that knowledge into structured edits, and a priority-based Evictor that enforces a fixed token budget.<br>Across long-context tasks, PEEK improves quality while using fewer iterations and lower cost than strong baselines, including the state-of-the-art prompt-learning framework ACE , and sits on the cost-quality Pareto frontier. Figure 1 shows a preview of the paper, including a benchmark snapshot. PEEK also generalizes across base LMs (both open-source and proprietary) and agent architectures, including OpenAI Codex, a production-grade coding agent.<br>Figure 1: Paper Snapshot. Prelude: A Cache Hiding in Plain Sight<br>I asked my GPT Image 2 to visualize the situation I found myself in when I first started my PhD...Over the past 3–5 years, context management has become one of the most important areas of progress for large language model (LLM) systems. We have seen long context windows, retrieval, context compaction, context offloading, prompt learning (also called context engineering or context adaptation), memory management, and KV-cache management (systems work I worked on as an undergrad at UChicago/LMCache Lab and Tensormesh, where our team was among the first to optimize KV-cache reuse beyond GPU HBM for LLMs).<br>When I started my PhD at MIT last September, I found myself standing at a crossroads, unsure which way to go: had the AI community already exhausted every obvious axis of context management, or was there still a systems idea hiding in plain sight?<br>The setting that made the answer feel concrete was agentic workloads. Recent agents, whether general-purpose assistants or coding agents, increasingly operate over long and recurring external contexts: document corpora, code repositories, enterprise records, and other resources that the agent queries again and again. I believe this pattern will only become more common.<br>As a student who started doing CS research in a systems group, I suddenly realized that we might be missing one of the oldest tricks in computer science: a cache . Not a KV cache, and not just a vector database, but a genuine agent-side cache. The intuition is simple: give the language model (LM) a small portion of its context window, a little blurb that is never compacted away, is never externalized into environment storage, and can be revised over time.<br>Then the real question becomes: how do we decide what should go in that blurb?<br>Consider an enterprise analyst repeatedly querying 50,000+ user-feedback entries:<br>Do users prefer feature A or feature B?<br>What onboarding complaints appear most often?<br>Which enterprise customers mention SSO problems?<br>Are complaints concentrated in one product area?<br>The corpus stays mostly fixed, but the tasks change. A human analyst would not start each question from scratch. After a few passes, they might keep a lightweight table of contents, memos about key entities and constants, a record of which regions have been inspected, and records of common intermediate results. An agent facing the same setting needs an analogous aid: not the whole corpus in the prompt, and not just a memory of previous chat turns, but a small maintained view of the external context that helps it re-enter the same context more intelligently each time.<br>What Existing Context Management Misses<br>Figure 2: Design Space of Agent State. Context maps fill the active external-context quadrant.Modern agentic systems already manage long contexts in several useful ways, but each preserves a different kind of object, and none preserves what we argue is most needed for repeated same-context workloads:<br>Shared chat carries prior...

context agent cache peek external prompt

Related Articles