Agentic Search Models with OpenSearch and Elasticsearch

Bonsai Blog | Fully Managed Elasticsearch & OpenSearch • Agentic Search Models with OpenSearch and Elasticsearch

Contact

June 5, 2026

Max Irwin<br>Guide,

SID,

Agents,

Relevance<br>13 min read

Tuning search is tricky, and the tools of yesterday are good but require lots of effort and data to get right. In this post I'm going to introduce purpose-built agentic LLMs for searching and reranking, which are an easy drop-in solution for relevance improvement.

Specifically, I got preview access to the SID-1 model, which I'll demonstrate after introducing the problems it is meant to solve. I'll walk you through implementing the model in an accessible search experience, building on an existing search application I made for my last post for the Gutenberg project corpus.

SID-1 is our first model for agentic retrieval: 1.9x more likely to surface the right results than embedding-only search across general knowledge, finance, science, legal, and email. It is more accurate than agentic retrieval based on Gemini 3 Pro, Sonnet 4.5, and GPT-5.1 at its highest compute setting, while being 24x faster (144 vs 5.7 seconds).

SID.ai

Before diving in to the post, let's start with a quick demo. Try a couple searches, observe what's happening, and then keep that experience in your head as I unfold the details in the rest of the article.

This demo is actually usable in this page. Try it out!

Try the interactive search above, or go to a full demo!

Problems

I always like to start my posts with a list of problems we have. And here are some really challenging gaps that exist in pretty much every hybrid search solution:

The best results might not be at the top

Noisy/irrelevant results might polute the page

The user's query might not match the corpus/domain language

Vector search has no cutoff (everything is a "match")

In search, each of these is the deepest of rabbit holes. I've cumulatively spend YEARS trying to solve these problems in various products. SID and other multi-turn retrieval models do a great job of generalizing a solution path that gets you most of the way there.

LLMs to the rescue?

Generally, the recent consensus has been "the AI should do it". But finding the best way for AI to solve these problems has been a rough road.

We tried using LLMs for judgements. Which is hit or miss, and even before ChatGPT was around I ran up against this paradox in February 2022:

But when automatically judging every result pair at query time and reranking we quickly ran into "oh my goodness this is very slow and expensive" problems...especially as pair-wise judgements is where we found the best effectiveness.

So lots of teams just start with RAG: tune a prompt to summarize the results, and deprioritize the results themselves. I've had some grievances with RAG for a while - summaries are hard to wrangle and evaluate. Also, while RAG fills a certain need, sometimes we just want the actual search results. We're smart and we want to look at source information and decide for ourselves. Not to mention that RAG gets in the way of discovery - one of the best parts of search.

Agentic retrievers as a solution

One pattern that has emerged recently is tool-based search aka agentic retrieval aka multi-turn retrieval. You've seen this pattern before running in the background of your coding agent or chat - several web searches get executed by the LLM to grab source material, then the best results are pulled out and used to bolster the context and improve the outcome.

This solution has several characteristics which solve the problems listed above. The agent using this technique does the following:

Writes several variants on a query to improve recall

Executes the queries and gathers the results

Picks the best results from all of the responses (and ignores noise)

Iterates on the process based on what it found

We call each iteration of this process a "turn" in agent lingo.

After several turns, you have a short-list of really good results. Agents like ChatGPT or Claude can sometimes take 7 or 8 turns on this process.

SID will usually only take 2 or 3 turns, and then goes one step further with a specialized reranking turn. If the query is particularly long (I experimented with entire paragraphs as well), SID will then take 6 to 7 turns at most for retrieval, and 1 turn for rerank. The reranker will look at the final list, and reorder the results for relevance based on everything it gathered along the way.

Here's an illustrated example. In a Gutenberg books corpus, we search for the query "novels inspired by early cultures". I'm not showing the actual results here, because first it's important to understand the process.

Turn 1: The model wrote 4 queries, we execute them all and gather ids

Turn 2: The model wrote 2 more queries, we executes them all and gather ids

Turn 3: The model has enough context and chooses the reranking "report_helpful_ids" tool.

Note...

Agentic Search Models with OpenSearch and Elasticsearch

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy