Bigger context windows won't save your agent

Rahul Baboota

SubscribeSign in

Rahul Baboota May 20, 2026

When I started building agents, I used to think that bigger context windows would make agent design easier. They definitely help but they don’t solve the problem I thought they would solve. I’ve spent the last 3 years building agents and one thing has stayed constant: No matter how large the context window gets, you still have to care a lot about what goes into it. A 64K context window had this problem.

A 200K context window had this problem.

A 1 million context window HAS this problem.

I believe that if you’re involved in building agents, you need to be obsessed with how to engineer the context of your agent. IMO, it is one of the highest leverage things you can do while designing agents. More Context is Not Better Context

Chroma did a comprehensive study last year and found that model performance starts to degrade significantly as you keep stuffing more into the context window. This is known as Context Rot.

This observation matches what I (and numerous other people) have seen while using/building agents. The issue is not just whether the model can fit the context. It is whether the context given to it is useful or not. As agents are becoming more capable, they’re also getting access to far more context. It’s not just a prompt and a bunch of tools anymore. Agents today have access to entire codebases, 10s if not 100s of tools and much more. The available context is just growing. But your agent does not need the entire context all at once. This is the part that makes context engineering valuable (and hard). For every task, an agent has access to a large amount of total context, but only some of it is required. The role of context engineering is to design the system so the retrieved context is as close as possible to the needed context. That is the whole game.

After building different types of agents over the past 3 years and seeing how leading AI labs tackle this problem, these are the three techniques that help the most: 1/ Context Offloading

The idea here is simple: If something is too large or is low-signal to keep in active context, offload it somewhere else and give the agent a pointer to it. In most cases, that somewhere else is a filesystem. That’s because modern agents are extremely good at file operations. They can search, grep, and read targeted sections of a file whenever needed. Context offloading is very common for tool inputs and outputs. If a tool input/output exceeds a defined threshold, you can simply dump the result to a file and replace it with a pointer to that file.

You can also apply offloading to older tool results. If a tool input/output is quite old relative to the current conversation, that can be compacted as well although it does mess up with the prompt cache so it is not as straightforward. This is what Claude’s tool result clearing feature does. The main thing with context offloading is that it is always designed to be restorable. 2/ Context Summarization

However, there is only so much that you can do with context offloading. As the agent will keep doing more work, its context window will start filling up. So, when the context size reaches its context limit and there is no more context eligible for offloading, you have to perform context summarization. This process usually has 2 components: Summary : An LLM generates a structured summary of the conversation and starts a new session with this summary as context. For example, in Claude Code the summary is created using a structured 9-section prompt covering things like files, code, errors and problem solving.

Offload Conversation : The complete, original conversation messages are still preserved by writing to a file which the agent can again look up via file search ops.

This dual approach ensures the agent maintains awareness of its goals and progress (via the summary) while preserving the ability to recover specific details when needed (via filesystem search). 3/ Context Isolation

Subagents solve the context bloat problem differently. I used to think that subagents were mostly a way to parallelize work. That’s true, but also incomplete. Subagents are also a context isolation mechanism. So, whenever your agent has to perform a task where the intermediate exploration does not matter and only the final result does, using subagents is the way to go. You might have noticed that Claude Code deploys a subagent when you ask it to search for how something works in a codebase.

This is the perfect use for a subagent. You don’t want to pollute the context window of your main agent with the exploratory steps that Claude will take to answer your query. So, all that works happens in a fresh context window with only the final response returned back to the main agent. Conclusion

The more I build agents, the more important I think context engineering is going to be. As I...

Bigger context windows won't save your agent

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast