We Ran a Complex Task — A LangChain Repo Analysis with Five Claude Models | CTRL NODE
Engineering · Jul 2, 2026 · 11 min read<br>We Ran a Complex Task — A LangChain Repo Analysis with Five Claude Models<br>Anthropic just shipped Claude Fable . We wanted a real answer to a practical question:
If you run the same complex engineering task on Opus, Fable, Sonnet, and Haiku — what do you actually get back?
Not a benchmark score. Not a vibe check. A full principal-engineer audit of a production open-source monorepo — with evidence, severity labels, and an execution plan.
We ran that experiment inside CTRL NODE : one prompt, five agents, five models, one cloned repository.
1. The goal: one hard task, five models
What we tested
We gave every model the same four-phase audit prompt and the same target : the LangChain Python monorepo (a large, mature library ecosystem — not a toy repo).
The prompt asks for:
Repository Map — explore first, judge second
Audit Report — architecture, security, tests, performance, deps, DX, docs (with file:line citations)
Improvement Strategy — themes, trade-offs, measurable “done” criteria
Task Plan — milestones M0–M3, quick wins, effort/risk/deps on each item
Every finding must be evidence-based . Guessing is explicitly forbidden.
That is a genuinely heavy task: thousands of files, real CI configs, security-sensitive deserialization paths, and god-class modules on hot code paths. It is the kind of work teams normally spread across several senior engineers.
Why Fable vs the rest
Fable is positioned as a strong reasoning model for long, structured work. We included it alongside:
Model<br>Role in the experiment
Claude Opus 4.8<br>Premium tier — threat modeling baseline
Claude Fable 5<br>New tier — strategy & execution planning
Claude Sonnet 5<br>Current Sonnet — primary audit pass
Claude Sonnet 4.6<br>Previous Sonnet — ops / CI lens
Claude Haiku 4.5<br>Fast tier — exploration & map
The hypothesis was not “Fable wins everything.” It was: each tier sees different things , and Fable might be the best at turning findings into a shippable backlog .
The prompt
The full prompt lives in our catalog as langchain-prompt.md. Core instruction (abbreviated):
You are a world-class, principal-engineer-level software engineer and technical audit expert.<br>Perform an in-depth analysis of this code repository, provide an honest audit report,<br>and offer a prioritized, actionable improvement plan.
Follow four phases in order: Discovery → Audit → Strategy → Task Plan.<br>All judgments must cite real file paths and line numbers. Do not guess.
Deliverables requested per run:
audit-report-.md — full Markdown report
audit-report-.html — interactive dark-theme dashboard (tabs: Overview, Map, Audit, Strategy, Tasks)
Summary of the prompt: resumen-langchain-prompt.md.
2. How we set it up in CTRL NODE
We did not paste the prompt into five browser tabs. We ran it the way a team would : Bridge on a real machine, a project work directory pointing at the clone, one agent per model tier.
Prerequisites
Bridge (ctrlnode) installed and paired — see Bridge setup.
Claude SDK API key set in ~/.ctrlnode/.env (providers load automatically — no PROVIDERS flag needed):
ANTHROPIC_API_KEY=sk-ant-...<br>BASE_PATH=/home/you/workspace
LangChain cloned on the Bridge host under BASE_PATH (CTRL NODE does not git-clone for you; the work directory points at an existing folder).
Project
In the web app: + NEW PROJECT
Field<br>Value
NAME<br>langchain-audit-experiment
AGENT TYPE<br>Claude
WORK DIRECTORY<br>Browse → select the LangChain clone → USE THIS DIRECTORY
DESCRIPTION<br>Five-model audit benchmark
The work directory is what lets agents read the full tree in WORK DIRECTORY task mode — the same scope a staff engineer would need.
Agents (one per model)
Team → + ADD AGENT — we created five agents on the same project:
Agent name<br>MODEL field<br>Purpose
audit-opus<br>claude-opus-4-8<br>Threat & design review
audit-fable<br>claude-fable-5<br>Strategy & task plan
audit-sonnet-5<br>claude-sonnet-5<br>Primary audit
audit-sonnet-46<br>claude-sonnet-4-6<br>CI / ops pass
audit-haiku<br>claude-haiku-4-5<br>Fast map
Models are selected in the MODEL combobox (synced from Bridge when online) or typed manually. Fable appears as claude-fable-5 in the Bridge model manifest (v2026.2.4+).
Optional AGENT SYSTEM INSTRUCTIONS were left minimal — we wanted the task prompt to carry the spec, not per-agent persona drift.
3. How we ran the prompt
For each agent, same procedure:
+ NEW TASK on the project
TITLE : LangChain principal audit —
INSTRUCTIONS : paste full contents of langchain-prompt.md
ASSIGN TO AGENT : pick the matching agent chip
OUTPUT MODE : WORK DIRECTORY (full repo scope; optional focus paths left empty)
NEW TASK → task lands in Backlog
RUN → dispatches to Bridge → agent moves to In progress
Bridge delivers the task with repositoryPaths and repo dispatch context so the Claude SDK runs against the LangChain tree on disk. Outputs (audit-report-*.md / .html) were collected...