Agents Just Need APIs

jb_hn2 pts0 comments

Agents Just Need APIs — agent-data blog<br>NewAgents Just Need APIs — Benchmarking three ways to give AI agents web access: read the writeup →

←blog/researchBrowser automation and ad-hoc web extraction have become the two primary ways agents access web data. Intuitively, structured APIs should provide a more ergonomic interface for agents. To assess the validity of this intuition, we sought to quantify how much more ergonomic structured APIs are for agents — that is, how reliably, how quickly, and with how few tokens an agent can complete a real web task when using structured APIs instead of browser automation or ad-hoc scraping.

FLIGHT SEARCH BENCHMARK

Cost per attempted run, by tool stack

Agents using browser automation were ~3× the cost of those using structured APIs.<br>Agents using web search + extraction were ~2×.

Structured API

$0.49

5 / 5 PASSED

Web search + extraction

$0.96

0 / 5 PASSED

Browser automation

$1.53

0 / 5 PASSED

$0.00<br>$0.50<br>$1.00<br>$1.50

MEDIAN COST PER ATTEMPTED RUN (USD)

Task: Find 3 round-trip flights NYC -> SFO, 2026-06-16 to 2026-06-19. Claude Code 2.1.147 · Sonnet 4.6<br>Attempts were limited to 80 turns per run. Hatched bar = all attempts hit the per-run turn budget mid-task.

Eval design<br>We gave Claude Code (running Sonnet 4.6) the following task:<br>Find 3 round-trip flight options for 1 adult from New York City to San Francisco, departing 2026-06-16 and returning 2026-06-19. Report a total round-trip price in USD for each of the 3 options and at least one source URL per option.<br>The full prompt — with airport whitelists, per-segment field requirements, and a uniqueness constraint — can be found in the appendix.<br>Each Claude Code session was given one of three tool stacks and ran 5 times. We applied a cap of 80 turns per run to limit runaway sessions.<br>Tool stacks<br>Structured API — the agent-data CLI, backed by a structured flight-information API endpoint.<br>Web search + extraction — Tavily search and extraction tools.<br>Browser automation — Playwright CLI.<br>Note: we picked representative implementations for each category. However, our focus here is on comparing tool types, not particular providers.<br>Headline results<br>ModalitySuccessCost per runInput tokens per runOutput tokens per runLatency per runTurns per runBrowser automation0/5$1.533,727,28213,089457s (~8min)80*Web search + extraction0/5$0.96969,89518,387451s (~8min)33Structured API5/5$0.49577,0428,035141s (~2min)15<br>Per-run statistics represent the medians across runs. Averages are shown in the Appendix.<br>* All browser automation runs hit the per-run turn limit.<br>Agents using browser automation were roughly 3.1× as expensive as those using a structured API and produced zero successful results. Web search + extraction agents came in around 1.9× the cost with the same zero-success result.<br>Cost represents model usage only, as estimated by Claude Code's total_cost_usd metadata. You can interpret this as: how expensive is it to use an agent with this tool. It does not include per-call API fees for the underlying tools (e.g., web search + extraction) — adding those would widen the gap further as failing runs also made many more tool calls.<br>Web search + extraction: clean execution, but imprecise results<br>All 5 attempts completed cleanly within the per-run turn limit. The agents executed the canonical workflow — broad search → targeted extraction → synthesis — without obvious procedural errors. The agent wrote 5–13 distinct, well-refined queries per run, and recovered fast from dead ends. For example, when booking aggregators returned empty markdown (their fare grids render only after an interactive submission), the agents pivoted within 1–2 turns to schedule sites that publish fully rendered route tables. Every run produced real outbound and return flight numbers (e.g., B6 115/116, AA 179/234, and DL 363/668), pulled from schedule sources that publish JFK→SFO route data.<br>Zero runs produced specific, date-bound fares. With only web search + extraction tools, the agent was unable to interact with booking sites (Expedia) or flight-search sites (Google Flights), leaving the agent to summarize seasonal averages from blogs and other webpages ("Jun range is typically $174–$524 one-way") which are not bookable.<br>Browser automation: screenshot, screenshot, screenshot<br>All 5 browser-automation runs hit the 80-turn cap, while consuming ~6.5x more tokens than agents using the structured API and ~3.8x more tokens than those using web search + extraction. In each attempt, the agent followed a consistent pattern: open a flight-search site with a pre-encoded deep link (Sonnet had strong priors here), then enter a screenshot → click → screenshot → fill → screenshot loop to drive the search UI.<br>Across all 5 runs, 34% of tool calls were screenshots, with the agent re-reading page state after every action. Another 14% were session management (named sessions, tab handling) as the agent tried to isolate one booking flow from another. Actions like clicks and form...

search agents extraction agent automation structured

Related Articles