We gave an AI agent eyes. It didn't even use them

pmkelly44441 pts0 comments

We gave an AI agent eyes. It didn't even use them · Captain's Log #2 · Agent Voyager Project<br>We gave an AI agent eyes. It didn't even use them.<br>May 29, 2026<br>When we saw how much Opus 4.8 cost, we decided to take a look at what the bottom shelf of the model aisle looked like. What resulted is a sort of recession-proof benchmark: how much hard work can a cheaper model accomplish, provided it's wrapped by a solid agent harness (Goose)?<br>So we reached for Claude Haiku 4.5 and gave it an extremely annoying PDF page to extract (a page from ParseBench, lifted straight from an arXiv paper). We tested two agent configurations: one that could see, via the pdf-vision MCP server, and one that could only read text, via Goose's built-in pdf_tool.<br>A few gentle spoilers on what we found, before you read on:<br>A good harness (Goose) can deliver on a genuinely tough task with an older, cheaper model, provided it has access to the right tools.<br>Blessing an agent with the gift of vision doesn't actually mean it is going to use it. In this case, the agent didn't use its eyesight at all.<br>Every step below is recorded with the Agent Voyager Project (AVP), a free, open, platform-agnostic standard for capturing what an agent does. Numbers and quotes are verbatim from the trajectories, on claude-haiku-4-5.<br>View the AVP spec ↗

The page that eats parsersFour tables crammed onto one sheet<br>This is page 7 of a 2012 econometrics paper, pulled from ParseBench. Four separate tables are crammed onto it. The one that matters is Table 7 : two six-by-six correlation matrices stacked on top of each other, triangular, half the cells blank, and values like 0.47 [0.49] where two numbers share one cell.

Table 7. Easy enough to read with your eyes, but brutal to read as text, because the layout carries the meaning and flat text throws the layout away.The task we gave Goose was easy to state: download the page, rebuild it as an HTML table, do not get it wrong.

Attempt 1Goose + pdf_tool5 turns$0.05 53% failed

First, the obvious move. Goose's built-in PDF reader (pdf_tool, a pdfplumber wrapper) pulls the text off the page. Here is what it handed back.

what pdf_tool returned<br>… Mar c h FB 4 - 7.309 O 9 - 1.513 69.312 1531.360 7.270 Ta ble 7. C ross c or r e latio n c oe f fic ien ts fo r six C P I ti me se rie s a nd their fir s t diff e r e nc e s. Or i g inal se rie s include 1 24 r e a din g s, and th e ir f irst di ff e re n c e s 123 r e a din g s. F FB SE F V O R PR R SH O F 1 FB 0.99998 1 SE F V 0.99714 0.99671 1 O R PR 0.98356 0.98295 0.98702 1 R SH 0. 97533 0.97478 0.97736 0.99698 1 O 0.97752 0.97661 0.98664 0.95629 0.93924 1 d F d FB d SE F V d O R PR d R SH d O d F 1 d FB 0.994 1 d SE F V 0.47 [ 0. 49] 0.4 8 [ 0. 49] 1 …<br>Every table on the page, poured into one run-on stream. Aug ust. ORP R. ti me se rie s. No rows, no columns, no way to tell where one table ends and the next begins. Goose even tried to pull images for the structure and got back “No images found in PDF,” so it worked with the text. It rebuilt the matrices, re-read to check, and declared victory.

“All values match perfectly.”<br>Goose, on claude-haiku-4-5, right before scoring 53%<br>It scored 53% , a clear fail, and nothing in the run flagged it: five turns and five cents, which made it the fastest run we recorded on the hardest page on the board. The score on its own just says “fail,” but the trajectory shows something worse, which is an agent that was confidently wrong and had no idea.

‹❚❚›

0/0AVP observes:…<br>▸View full AVP JSON.,model claude-haiku-4-5tools shell, write, edit, computercontroller__web_scrape, computercontroller__pdf_tool

The whole voyage, recorded: a quick, smooth, doomed five turns. Press play, then open “View full AVP JSON” to read the agent's own words at each step.<br>▸View the full commission promptDownload the PDF page at {input}, read it, rebuild it as a single HTML , then re-read the original and verify your table matches before returning. Output only the HTML.<br>Attempt 2Goose + pdf-vision24 turns$0.33 100% passed

Same model, same page. This time we swapped the built-in reader for pdf-vision, an MCP server whose headline trick is rendering the page to an image so the model can look at it. We even told Goose, in the prompt, to trust the picture over the text.<br>The eyes did not work. The image kept coming back empty, so Goose tried to see the page another way, and then kept trying as each new approach failed.

“Now let me render the page as an image to see the table structure.”<br>get_page_image came back empty<br>“Let me get the page image with file output so I can view it.”<br>get_page_image saved a PNG it still can't see<br>“Now let me view the image using the computer controller.”<br>open page.png nothing<br>“Let me use computer controller to view the image.”<br>cat page.png | base64 a wall of base64<br>“Let me examine the PDF more carefully to identify which page contains the table.”<br>get_pdf_info “the PDF has 1 page”<br>“Each Y position represents a row. Let me build the complete...

page agent goose read table view

Related Articles