GitHub - sid732/LocalContextRouter: Preflight router that decides locally whether each document page reaches a multimodal model as text, on-device OCR, or an image. Cuts vision-token cost. macOS. · GitHub
/" data-turbo-transient="true" />
Skip to content
Search or jump to...
Search code, repositories, users, issues, pull requests...
-->
Search
Clear
Search syntax tips
Provide feedback
--><br>We read every piece of feedback, and take your input very seriously.
Include my email address so I can be contacted
Cancel
Submit feedback
Saved searches
Use saved searches to filter your results more quickly
-->
Name
Query
To see all available qualifiers, see our documentation.
Cancel
Create saved search
Sign in
/;ref_cta:Sign up;ref_loc:header logged out"}"<br>Sign up
Appearance settings
Resetting focus
You signed in with another tab or window. Reload to refresh your session.<br>You signed out in another tab or window. Reload to refresh your session.<br>You switched accounts on another tab or window. Reload to refresh your session.
Dismiss alert
{{ message }}
sid732
LocalContextRouter
Public
Notifications<br>You must be signed in to change notification settings
Fork
Star
main
BranchesTags
Go to file
CodeOpen more actions menu
Folders and files<br>NameNameLast commit message<br>Last commit date<br>Latest commit
History<br>67 Commits<br>67 Commits
.claude/skills/local-context-router
.claude/skills/local-context-router
.github/workflows
.github/workflows
docs
docs
ocr
ocr
src/localcontextrouter
src/localcontextrouter
tests
tests
.gitignore
.gitignore
CHANGELOG.md
CHANGELOG.md
LICENSE
LICENSE
README.md
README.md
hatch_build.py
hatch_build.py
pyproject.toml
pyproject.toml
View all files
Repository files navigation
LocalContextRouter
Decide locally how each page of a document should reach a multimodal model:<br>as extracted text, on-device OCR, or a rendered image. That keeps you from<br>paying for vision tokens on pages that are only text.
A multimodal model reads a PDF by pulling its text and rendering every page to<br>an image, then billing for both. On a text page that image runs roughly<br>1,300 to 4,800 tokens while the same page as plain text is 400 to 800. For a<br>text-dominant document that is several times the cost for nothing extra.<br>LocalContextRouter does the cheap work on your machine first and tells you what<br>each page actually needs.
It does not call a model. It returns a per-page decision and the text to send;<br>your application still makes the call.
How it decides
For each page:
A usable text layer that is mostly prose: use the extracted text.
A text layer dominated by a table, chart, or diagram: send the page as an<br>image, where the layout carries the meaning.
No usable text, such as a scan or a photo: recognize it on-device with<br>Apple's Vision framework.
The result also reports how many tokens you saved against sending every page as<br>an image.
Install
pip install localcontextrouter
macOS only. The wheel bundles a universal (Apple Silicon and Intel) OCR binary,<br>so text recognition works with no extra setup.
Command line
localctx invoice.pdf<br>localctx invoice.pdf --json<br>localctx scan.png
localctx invoice.pdf prints each page, the source chosen for it, and the<br>tokens saved:
Document: invoice.pdf (3 pages)<br>Tokens saved vs sending every page as an image: 3085
Page 1 [text]<br>ACME Corp, Invoice #4471 ...
Page 2 [vision]<br>Quarterly results by segment ...
Page 3 [ocr]<br>SCANNED RECEIPT TOTAL 42.00
Add --vision-dir DIR to render the pages that should go to the model as images<br>into DIR; their paths are then listed in the output and the JSON.
In code
from localcontextrouter import route_pdf, Source
result = route_pdf("invoice.pdf")<br>for page in result.pages:<br>if page.source is Source.VISION:<br>send_image(page.index) # the page's meaning is visual<br>else:<br>send_text(page.text) # extracted or recognized text
print(result.tokens_saved)
Every page also carries an estimate of its cost both ways, as<br>page.tokens.text_tokens and page.tokens.image_tokens.
As an agent skill
local-context-router is an Agent Skill in the open SKILL.md format, so it<br>works in Claude Code and other compatible agents. It lives in this repository<br>under .claude/skills/local-context-router; copy that folder into your agent's<br>skills directory:
cp -r .claude/skills/local-context-router ~/.claude/skills/
With the package installed, the agent runs the preflight on any PDF or image you<br>share, then uses the text for the cheap pages and attaches images only for the<br>visual ones.
Requirements and scope
macOS 11 or newer. Recognition uses the Apple Vision framework and needs a<br>normal macOS graphics environment; it will not run inside a headless sandbox<br>that lacks one.
Python 3.10 or newer.
The scope is per-page routing, on-device OCR, and a token estimate. Retrieval<br>over very large documents is out of scope.
License
MIT. See LICENSE.
About
Preflight router that decides locally whether each document page reaches a...