Context on Context - by David Manheim - David’s Substack
David’s Substack
SubscribeSign in
Context on Context<br>Where "context" for language models comes from, and where it's going.
David Manheim<br>Jul 01, 2026
Share
Context windows for LLMs are both a narrow technical topic, and also a broad conceptual one. This is important because agentic systems can now choose, compress, retrieve, and share their context. That means the environment is also a latent future context window in the technical sense, blurring that sense with the conceptual idea of context. This brief writeup is an attempt to provide a basic conceptual basis for the topic, especially when thinking about risks of agentic systems.
A Brief History of Context
We can start by contrasting the prehistory of LLMs and the realm of robotics. Thirty years ago, Yoshua Bengio noted that learning long term dependencies with gradient descent is hard - and the long-term dependencies in question are analogous to, but distinct from, context windows. At the time, these dependencies were learned internal states of the model, frozen in the AI winter - but AI had started to thaw almost 20 years later when we were told that attention is all you need. That paper introduced transformer models, which moved to attention as a mechanism and in some ways introduced the idea of conditioning a model on a sequence of tokens.<br>The ground was set for the revolution, but it was a couple more years before anyone realized (in both the conceptual and implementation senses) that language models are unsupervised multitask learners. This introduced the first demonstration of Large Language Models ability to do unsupervised learning across tasks. It also “prompted” discussions of in-context behavior - making the context window into something more than just inputs to a model.<br>But the advance also highlighted how critical Bengio’s problem remained; the new architectures traded longer context for exponentially more computation. (With tradeoffs between costs of hidden layers and training context.) This kept context windows very small - but several generations of improvement, each taking many months, moved from RoPE to YaRN (via ALiBi,) enabled longer context. Unfortunately, the capabilities of models with longer context windows were Lost in the Middle, so that more context did not functionally improve the models as much as one might hope.<br>And the remaining challenges have been largely addressed - but before we reach the premodern era of reasoning and retrieval models, we turn to the vastly different but closely related domains of robotics and reinforcement learning.<br>In robotics, context refers to the more plain-language understanding of the environment around the system, which includes the latent properties of the environment, and about the task. This usually distinguishes between state, observation, and context; the state is about the robot’s system state, position, and other status, while the observations provide clues about the outside world and context. That is, a robot can access context information, but it’s not necessarily reflected inside the model’s state or computations. On the other hand, the internal model principle requires effective systems to have some implicit mapping between relevant environmental factors and internal states.<br>In reinforcement learning, researchers formalize the idea slightly more; context is the facts about the environment that map between the state and actions of the model, and the reward it receives. That is, context is side information which may or may not map onto system states, but for RL to maximize reward, it must eventually be represented implicitly internally in the model.<br>But with increasing attention to the overlap of LLMs and RL, we move towards modern LLMs, including reasoning models, agents, and retrieval augmented generation.<br>Context in Protoagentic and Agentic AI
Now that we have presented the context of context for modern AI, we need to provide the technical context for those systems. First, reasoning models moved from using context as input and output, to having hidden context, with thinking tokens. This presaged and parallelled the advent of Retrieval-Augmented-Generation, where a model can retrieve information and insert that information into its context. The combination of these two was augmented by providing additional capabilities to the systems integrating language models into AI agents.<br>But even before discussing agentic AI, there was a shift in what context was. In earlier LLM systems, as well as in robotics and RL, context was provided, not manipulated. The use of reasoning tokens, in contrast, is intentionally leveraging changing the context window content in order to allow and augment in-context reasoning. This is even more true with retrieval augmented systems, where information that was neither provided by a user nor generated by the LLM is inserted into the context window.<br>None of this changed the technical nature of...