Browser agent that reads a page in ~2k tokens, not ~180k

pixelpi - npm

npm

Search Sign UpSign In

pixelpi

0.1.6 • Public • Published 5 days ago Readme Code Beta 4 Dependencies 0 Dependents 7 Versions

pixelpi

A minimal browser-agent harness. Six tools, raw CDP, any model.

The page is the prompt.

npm i -g pixelpi

pixelpi "find the top story on Hacker News": the agent opens a real Chrome, looks once, and reports the title in a few steps. No Playwright, no vision model, no cloud.

Every other browser agent buries the model under a 20–30-tool MCP surface and a raw-DOM firehose. pixelpi gives it six primitives and a bounded view of the page. A heavy page that costs ~180k tokens as raw DOM, pixelpi hands the model in ~2k. That's 37× to 100× fewer tokens across real sites, and it stays flat as the page grows. The model already knows how to use a browser; pixelpi just gets out of the way.

If pixelpi saves you a 30-tool MCP install, a star helps others find it.

Install

npm i -g pixelpi # the CLI pixelpi # first run → guided setup, then an interactive chat

Quickstart

npm i -g pixelpi # 1. install the global binary pixelpi auth # 2. set provider + key (or: export ANTHROPIC_API_KEY=…) pixelpi "find the top story on Hacker News and store its title" # 3. run a task

First run with no config drops you into guided setup (provider · key · model), then an interactive browser-agent chat. pixelpi --json "…" emits NDJSON for scripts.

Sessions and login

Every run uses a fresh, disposable Chrome profile by default (logged out). To stay logged in across runs, use a persistent profile:

pixelpi login https://github.com # opens a real Chrome; sign in, press Enter to save pixelpi --profile "check my GitHub notifications" # reuses the saved session, headless

--profile uses ~/.pixelpi/profile; --profile= uses a custom one (handy for separate accounts).

Omit --profile for a fresh disposable profile each run.

Chrome locks a profile dir, so don't run two tasks against the same profile at once.

pixelpi finds Chrome automatically on macOS, Linux, and Windows. Set PIXELPI_CHROME=/path/to/chrome to override.

Record and replay

Save a solved run as a trace and replay it later with no model in the loop. The first run is the compile step; every replay is the binary: free, deterministic, and fast.

pixelpi "find the top story on Hacker News" --record hn-top # solve once, save a trace pixelpi replay hn-top # rerun it with no model, 0 tokens pixelpi replay hn-top --heal # repair one step with the model if the page drifted

Traces key on the accessibility role and name of each element, not CSS selectors or coordinates, so they survive most layout churn. A bare name lives in ~/.pixelpi/traces/; pass a path (or a name ending in .json) to keep a trace inside a repo.

--record writes only when the run completes. Omit the name and it auto-slugs the task.

Strict replay needs no API key. On drift it stops and exits 3, naming the step that no longer matches. --heal re-derives just that step with the model and rewrites the trace, so it self-corrects over time.

Replay reproduces actions, not intent: it is for stable, repeated flows (a login, an export, a scrape). --heal is what reintroduces judgment when a page has genuinely changed.

The six primitives

look · act · fill · nav · eval · store

look(mode?, filter?) : compact, ref-indexed accessibility/DOM snapshot. The read.

act(ref, op, value?) : mutate the page by stable ref via trusted CDP input events. The write/edit.

fill(fields[]) : batched form fill in one call.

nav(action, arg?) : navigate, tabs, waitfor. The cd / processes.

eval(fn, args?, opts?) : arbitrary JS in the page realm. The escape hatch, the bash of the browser.

store(action, key?, value?) : durable host-side JSON KV. The filesystem.

Elements are addressed by stable ref (not CSS/coordinates): cheap, deterministic, resilient to layout churn. Everything else is composable from eval; the agent writes its own higher-level tools as JSON skills at runtime, and only each skill's one-line description enters the prompt.

Why it's different

pixelpi Playwright MCP Chrome DevTools MCP

Tools in context 21 31

Tool-def + prompt tokens ~1,055 ~13,700 ~18,000

Page representation a11y tree (bounded) mixed mixed

Substrate

raw CDP (no Playwright) Playwright CDP

Self-extension agent writes JS skills at runtime no no

Replay record once, replay with 0 tokens

no no

Token cost: look() vs a raw-DOM dump, measured across the 15 sites WebVoyager tests on (full table + script in bench/):

Site look() raw DOM factor

Coursera 1,997 tok 202,892 tok 101.6×

GitHub 1,955 tok 146,787 tok 75.1×

Apple 2,254 tok 96,507 tok 42.8×

Hugging Face 1,932 tok 45,300 tok 23.4×

ArXiv 1,588 tok 10,652 tok 6.7×

37× to 100× fewer tokens across these sites (37× median). look() holds ~2k tokens whatever the page weighs, while the raw DOM keeps growing. Five of the twelve bot-block headless Chrome and return an empty page; bench/ has the full run. Reproduce it yourself: pnpm bench:tokens, no key...

Browser agent that reads a page in ~2k tokens, not ~180k

Related Articles

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars

Italy's Meloni says Trump 'made up' story that she 'begged' him for photo at G7