Agents Have (Information) Needs

jbarrow1 pts0 comments

Agents have (Information) Needs — Joe Barrow

retrieval

Agents have (Information) Needs

Information retrieval is about satisfying an information need, but a query is a poor stand-in. Your agent is capable of expressing one, so you should probably use it.

By Joe Barrow 2026-05-13

TL;DR

Information retrieval is about satisfying an information need.<br>An agent&rsquo;s reasoning traces reveal a lot about its information need, so we probably shouldn&rsquo;t throw it away.<br>Instead, we should build a good retrieval system that can account for it.

One thing that seems to get lost in the hubbub around harness design is why we give agents access to retrieval tools.<br>We give agents retrieval tools so they can satisfy some information need .

A sufficiently persistent user, given access to even basic tools, will be able to satisfy their information need through repeated querying and refining.<br>Recent work like Direct Corpus Interaction (DCI) [1] shows that agentic retrieval systems can do a pretty good job of finding things with just grep and bash.<br>(It just happens to take 2x the number of queries.)

What does it mean that an agent has an information need?<br>The idea of information need is an old idea in information retrieval: when you go to execute a Google search, your query is just a compromise you have to make to satisfy your information need.<br>What you&rsquo;re actually searching for is your information need , which is latent in your mind:

A good example of &ldquo;information need&rdquo; comes from old-school TREC queries.<br>They were called &ldquo;TDN&rdquo; queries, for &ldquo;Topic, Description, Narrative.&rdquo;<br>The queries include a long-form description of the key aspects of the topic, and a narrative describing relevant documents.

Here&rsquo;s an example from the TREC 1999 Ad Hoc dataset [2].

osteoporosis

Find information on the effects of the dietary intakes<br>of potassium, magnesium and fruits and vegetables as<br>determinants of bone mineral density in elderly men<br>and women thus preventing osteoporosis (bone decay).

A relevant document may include one or more of the<br>dietary intakes in the prevention of osteoporosis.<br>Any discussion of the disturbance of nutrition and<br>mineral metabolism that results in a decrease in<br>bone mass is also relevant.

Agents similarly have an information need they&rsquo;re seeking to satisfy when they issue and refine queries.<br>When you ask an agent &ldquo;will fruits and vegetables help with my osteoporosis,&rdquo; and it starts searching for osteoporosis fruits and vegetables and bone density vegetables, it&rsquo;s issuing and refining queries to satisfy some information need.

But those queries are a compromise.<br>They are a retrieval-system-friendly representation of the actual information need.<br>The agent can infer a lot more about the user&rsquo;s information need, like the fact that they&rsquo;re asking for personal health, probably wants actionable information, etc.

A BRIGHT Idea

At some point the retrieval community agreed to not use the Descriptions or Narratives.<br>But what if we just… let the agent express their own information need?<br>And used that to inform our search?

BRIGHT<br>BRIGHT is a &ldquo;benchmark for reasoning-intensive retrieval,&rdquo; where traditional lexical retrievers don&rsquo;t perform well.<br>Consider the LeetCode examples in BRIGHT: find problems whose solutions share an algorithm with this sample problem.

This is, in my opinion, one of the key results from BRIGHT [3] and subsequent work.<br>From the paper:

BRIGHT p. 8

Querying with LLM reasoning steps improves retrieval performance. […] using Llama-3-70B or GPT-4 reasoning steps as queries significantly improves performance compared to the original query

And follow-on work, like Reason-ModernCOLBERT from Antoine Chaffin [4] or Reason-IR [5] shows that we can actually build this right into our retrieval system!<br>By training the retrieval system with reasoning data in the queries, and using that reasoning data in the harness, you see pretty substantial gains in agentic retrieval performance:

Basically, a 10%+ bump on BrowseComp-Plus, an agentic retrieval benchmark, by using a retriever that allows the agent to express its information need.

What does this look like?<br>Well, an awful lot like a TDN query!<br>Here&rsquo;s the prompt the BRIGHT authors used to elicit reasoning

(1) Identify the essential problem in the post.<br>(2) Think step by step to reason about what should be included in the relevant documents.<br>(3) Draft an answer.

Your harness can prompt not only for the &ldquo;query,&rdquo; but for the reasoning behind the query, and what makes an effective document.<br>Reasoning-ModernColBERT and Agent-ModernColBERT show that this can provide wins over simpler search tools.<br>DCI showed that agents can execute long-horizon search tasks to satisfy information needs.<br>But if we account for those information needs to begin with, we can get better retrieval – and results!

References

Zhuofeng Li, Haoxiang Zhang, Cong...

information retrieval rsquo reasoning queries agent

Related Articles