Browser agent that reads a page in ~2k tokens, not ~180k

josharsh1 pts0 comments

pixelpi - npm

npm

Search<br>Sign UpSign In

pixelpi

0.1.6 • Public • Published 5 days ago<br>Readme<br>Code Beta<br>4 Dependencies<br>0 Dependents<br>7 Versions

pixelpi

A minimal browser-agent harness. Six tools, raw CDP, any model.

The page is the prompt.

npm i -g pixelpi

pixelpi "find the top story on Hacker News": the agent opens a real Chrome, looks once, and reports the title in a few steps. No Playwright, no vision model, no cloud.

Every other browser agent buries the model under a 20–30-tool MCP surface and a raw-DOM firehose. pixelpi gives it six primitives and a bounded view of the page. A heavy page that costs ~180k tokens as raw DOM, pixelpi hands the model in ~2k. That's 37× to 100× fewer tokens across real sites, and it stays flat as the page grows. The model already knows how to use a browser; pixelpi just gets out of the way.

If pixelpi saves you a 30-tool MCP install, a star helps others find it.

Install

npm i -g pixelpi # the CLI<br>pixelpi # first run → guided setup, then an interactive chat

Quickstart

npm i -g pixelpi # 1. install the global binary<br>pixelpi auth # 2. set provider + key (or: export ANTHROPIC_API_KEY=…)<br>pixelpi "find the top story on Hacker News and store its title" # 3. run a task

First run with no config drops you into guided setup (provider · key · model), then an interactive browser-agent chat. pixelpi --json "…" emits NDJSON for scripts.

Sessions and login

Every run uses a fresh, disposable Chrome profile by default (logged out). To stay logged in across runs, use a persistent profile:

pixelpi login https://github.com # opens a real Chrome; sign in, press Enter to save<br>pixelpi --profile "check my GitHub notifications" # reuses the saved session, headless

--profile uses ~/.pixelpi/profile; --profile= uses a custom one (handy for separate accounts).

Omit --profile for a fresh disposable profile each run.

Chrome locks a profile dir, so don't run two tasks against the same profile at once.

pixelpi finds Chrome automatically on macOS, Linux, and Windows. Set PIXELPI_CHROME=/path/to/chrome to override.

Record and replay

Save a solved run as a trace and replay it later with no model in the loop. The first run is the compile step; every replay is the binary: free, deterministic, and fast.

pixelpi "find the top story on Hacker News" --record hn-top # solve once, save a trace<br>pixelpi replay hn-top # rerun it with no model, 0 tokens<br>pixelpi replay hn-top --heal # repair one step with the model if the page drifted

Traces key on the accessibility role and name of each element, not CSS selectors or coordinates, so they survive most layout churn. A bare name lives in ~/.pixelpi/traces/; pass a path (or a name ending in .json) to keep a trace inside a repo.

--record writes only when the run completes. Omit the name and it auto-slugs the task.

Strict replay needs no API key. On drift it stops and exits 3, naming the step that no longer matches. --heal re-derives just that step with the model and rewrites the trace, so it self-corrects over time.

Replay reproduces actions, not intent: it is for stable, repeated flows (a login, an export, a scrape). --heal is what reintroduces judgment when a page has genuinely changed.

The six primitives

look · act · fill · nav · eval · store

look(mode?, filter?) : compact, ref-indexed accessibility/DOM snapshot. The read.

act(ref, op, value?) : mutate the page by stable ref via trusted CDP input events. The write/edit.

fill(fields[]) : batched form fill in one call.

nav(action, arg?) : navigate, tabs, waitfor. The cd / processes.

eval(fn, args?, opts?) : arbitrary JS in the page realm. The escape hatch, the bash of the browser.

store(action, key?, value?) : durable host-side JSON KV. The filesystem.

Elements are addressed by stable ref (not CSS/coordinates): cheap, deterministic, resilient to layout churn. Everything else is composable from eval; the agent writes its own higher-level tools as JSON skills at runtime, and only each skill's one-line description enters the prompt.

Why it's different

pixelpi<br>Playwright MCP<br>Chrome DevTools MCP

Tools in context<br>21<br>31

Tool-def + prompt tokens<br>~1,055<br>~13,700<br>~18,000

Page representation<br>a11y tree (bounded)<br>mixed<br>mixed

Substrate

raw CDP (no Playwright)<br>Playwright<br>CDP

Self-extension<br>agent writes JS skills at runtime<br>no<br>no

Replay<br>record once, replay with 0 tokens

no<br>no

Token cost: look() vs a raw-DOM dump, measured across the 15 sites WebVoyager tests on (full table + script in bench/):

Site<br>look()<br>raw DOM<br>factor

Coursera<br>1,997 tok<br>202,892 tok<br>101.6×

GitHub<br>1,955 tok<br>146,787 tok<br>75.1×

Apple<br>2,254 tok<br>96,507 tok<br>42.8×

Hugging Face<br>1,932 tok<br>45,300 tok<br>23.4×

ArXiv<br>1,588 tok<br>10,652 tok<br>6.7×

37× to 100× fewer tokens across these sites (37× median). look() holds ~2k tokens whatever the page weighs, while the raw DOM keeps growing. Five of the twelve bot-block headless Chrome and return an empty page; bench/ has the full run. Reproduce it yourself: pnpm bench:tokens, no key...

pixelpi page model profile tokens replay

Related Articles