LocalContextRouter – stop paying vision-token prices for text PDF pages

sid7322 pts0 comments

GitHub - sid732/LocalContextRouter: Preflight router that decides locally whether each document page reaches a multimodal model as text, on-device OCR, or an image. Cuts vision-token cost. macOS. · GitHub

/" data-turbo-transient="true" />

Skip to content

Search or jump to...

Search code, repositories, users, issues, pull requests...

-->

Search

Clear

Search syntax tips

Provide feedback

--><br>We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Cancel

Submit feedback

Saved searches

Use saved searches to filter your results more quickly

-->

Name

Query

To see all available qualifiers, see our documentation.

Cancel

Create saved search

Sign in

/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

{{ message }}

sid732

LocalContextRouter

Public

Notifications<br>You must be signed in to change notification settings

Fork

Star

main

BranchesTags

Go to file

CodeOpen more actions menu

Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit

History<br>67 Commits<br>67 Commits

.claude/skills/local-context-router

.claude/skills/local-context-router

.github/workflows

.github/workflows

docs

docs

ocr

ocr

src/localcontextrouter

src/localcontextrouter

tests

tests

.gitignore

.gitignore

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

hatch_build.py

hatch_build.py

pyproject.toml

pyproject.toml

View all files

Repository files navigation

LocalContextRouter

Decide locally how each page of a document should reach a multimodal model:<br>as extracted text, on-device OCR, or a rendered image. That keeps you from<br>paying for vision tokens on pages that are only text.

A multimodal model reads a PDF by pulling its text and rendering every page to<br>an image, then billing for both. On a text page that image runs roughly<br>1,300 to 4,800 tokens while the same page as plain text is 400 to 800. For a<br>text-dominant document that is several times the cost for nothing extra.<br>LocalContextRouter does the cheap work on your machine first and tells you what<br>each page actually needs.

It does not call a model. It returns a per-page decision and the text to send;<br>your application still makes the call.

How it decides

For each page:

A usable text layer that is mostly prose: use the extracted text.

A text layer dominated by a table, chart, or diagram: send the page as an<br>image, where the layout carries the meaning.

No usable text, such as a scan or a photo: recognize it on-device with<br>Apple's Vision framework.

The result also reports how many tokens you saved against sending every page as<br>an image.

Install

pip install localcontextrouter

macOS only. The wheel bundles a universal (Apple Silicon and Intel) OCR binary,<br>so text recognition works with no extra setup.

Command line

localctx invoice.pdf<br>localctx invoice.pdf --json<br>localctx scan.png

localctx invoice.pdf prints each page, the source chosen for it, and the<br>tokens saved:

Document: invoice.pdf (3 pages)<br>Tokens saved vs sending every page as an image: 3085

Page 1 [text]<br>ACME Corp, Invoice #4471 ...

Page 2 [vision]<br>Quarterly results by segment ...

Page 3 [ocr]<br>SCANNED RECEIPT TOTAL 42.00

Add --vision-dir DIR to render the pages that should go to the model as images<br>into DIR; their paths are then listed in the output and the JSON.

In code

from localcontextrouter import route_pdf, Source

result = route_pdf("invoice.pdf")<br>for page in result.pages:<br>if page.source is Source.VISION:<br>send_image(page.index) # the page's meaning is visual<br>else:<br>send_text(page.text) # extracted or recognized text

print(result.tokens_saved)

Every page also carries an estimate of its cost both ways, as<br>page.tokens.text_tokens and page.tokens.image_tokens.

As an agent skill

local-context-router is an Agent Skill in the open SKILL.md format, so it<br>works in Claude Code and other compatible agents. It lives in this repository<br>under .claude/skills/local-context-router; copy that folder into your agent's<br>skills directory:

cp -r .claude/skills/local-context-router ~/.claude/skills/

With the package installed, the agent runs the preflight on any PDF or image you<br>share, then uses the text for the cheap pages and attaches images only for the<br>visual ones.

Requirements and scope

macOS 11 or newer. Recognition uses the Apple Vision framework and needs a<br>normal macOS graphics environment; it will not run inside a headless sandbox<br>that lacks one.

Python 3.10 or newer.

The scope is per-page routing, on-device OCR, and a token estimate. Retrieval<br>over very large documents is out of scope.

License

MIT. See LICENSE.

About

Preflight router that decides locally whether each document page reaches a...

page text localcontextrouter vision image router

Related Articles