Travellers in a Foreign Land

rubenflamshep1 pts0 comments

Travellers in a Foreign Land - Ruben Flam-Shepherd

Though it did not grab the zeitgeist like "Vibe Coding", Andrej Karpathy tried to coin the term "Jagged Intelligence" to describe how LLMs can "both perform extremely impressive tasks...while simultaneously struggle with some very dumb problems."? He'd been beaten to this concept by Dell'Acqua and colleagues who'd used the term "Jagged Technological Frontier" in their 2023 paper. And while both of these parties have beaten me to this particular punch, I will ape their ideas to offer my own incremental contribution. Because they are both describing a line demarking capability. A boundary. A border on a map, of a foreign land that we are all travelers in: the land of agentic capabilities.

The Map and its Parts

I've found this to be a useful metaphor for thinking about agentic capabilities. We illuminate parts of the landscape by issuing tasks to agents and then evaluating their success. This iterative cycle — give the agent a task, evaluate the output, update our map — is the basic building block through which our understanding of agent capabilities evolves and develops . Our map consists of three different parts:

Charted Territory : Tasks we expect the agent to either fail or accomplish because they resemble tasks that the agent has failed or accomplished for us in the past.

The Borderlands : Tasks that either 1) agents only intermittently succeed at or 2) are similar to past things the agent has tried but different enough that we are unsure of the outcome.

The Wilds : Tasks that do not resemble anything we've asked the agent to do. They are far enough outside our understanding of agentic capabilities that we don't have a good sense of the likelihood of success or failure.

Tasks in Charted Territory are our workhorses. They are how we leverage agents to increase our output?. This is landscape we've navigated before so we know what to expect.

The Borderlands is the (jagged) frontier that separates what we know about agent capabilities from what we don't know. This is where we push our knowledge of agent capabilities forward. The more time we spend here, the more territory we chart, the better our understanding becomes. Spending time here is important but past a certain point has diminishing returns — we’re not really doing work in this space so much as we are establishing the boundary demarking which kinds of work can be reliably accomplished. The release of new models redraws this boundary; the Borderlands become fuzzy. What is not possible today may become possible tomorrow.

The Wilds are unknown. Sometimes we might ask the agent to do something out of pocket because the cost of trying is cheap?. But the expectation is failure. However, this is where we discover entirely new frontiers of agent capabilities.

Here are some things I've discovered as I've built out my map:

Send Multiple Scouts

One easy way to improve the throughput of the iterative cycle described above is to ask the agent to implement multiple versions of a given task in a single prompt. This allows us to explore an area instead of a single point. Some examples:

Agents struggle with tasks that are of the "take this data and generate a useful chart" variety. Sometimes they succeed but often the chart is broken in some way. Given the intermittent nature of this task, it lives in The Borderland. When useful charts are generated it's worth exploring to understand how we can more consistently get useful results. In Agentic Data Analysis with Claude Code we generate three SQL queries per table and four charts per query. This allows us to explore twelve different versions of the table -> query -> chart flow. With this expanded search space we can notice what works and update our prompts accordingly.

When designing UI elements agents will often present a variety of approaches for the user to choose from. Instead of choosing a single option, ask the agents to implement all of them behind different temporary endpoints. This lets us make a decision based on the actual end product (instead of a text description) and allows us to further evaluate if the agent can actually implement the given UI it's proposed.

Sending Multiple Scouts lets us move faster, make better decisions, and more rapidly increase the size of our Charted Territory. By sending out scouts alongside established workflows we can straddle the area between The Borderland and Charted Territory. This lets us expand our map while leaning on established workflows to remain productive.

Sending multiple scouts is also great because implicit within is our next point, parallelization.

Parallel-ness is Next to Godliness

Parallelization is one of the best ways to reduce run-times of agentic workflows.

Per this StackOverflow? post here:

You know code is a good candidate for parallelization when it can be broken into a set of "discrete" (i.e. independent) tasks.

This definition also extends to agentic systems?. In Agentic Data...

agent tasks agentic capabilities agents territory

Related Articles