The Weaver Stack: one contract layer for safe LLM agents

dgenio1 pts0 comments

weaver-spec: A Shared Contract Layer for LLM Agents | Towards AISitemapOpen in appSign up<br>Sign in

Medium Logo

Get app<br>Write

Search

Sign up<br>Sign in

Towards AI

We build Enterprise AI. We teach what we learn. Join 100K+ AI practitioners on Towards AI Academy. Free: 6-day Agentic AI Engineering Email Guide: https://email-course.towardsai.net/

The Weaver Stack: One Contract Layer for Safe LLM Agents

Four agent problems — tool explosion, context bloat, unsafe execution, flaky orchestration — fixed by one shared contract layer.

Diogo Santos

8 min read·<br>3 days ago

Listen

Share

Press enter or click to view image in full size

You wired up an LLM agent. It worked in the demo. Then you gave it real tools, and four things broke at once.<br>You injected every tool schema into the prompt, and 1,000+ definitions ate your context window and your token budget. Raw tool outputs — database rows, API blobs, file contents — came back huge and full of things the model should never see. There was no record of what ran, on whose authority, or why. And the moment you chained more than two steps, retries and partial failures turned the pipeline into something you couldn’t reproduce twice.<br>Here’s the part that stings: each of those four problems has been solved a dozen times. Everyone writes their own tool router, their own output filter, their own audit log, their own retry logic. They just don’t compose. Your router and your executor disagree about what a “routing decision” even is, so you write glue. Then you write glue for the glue.<br>weaver-spec is an attempt to fix the boundary, not to ship yet another framework.<br>The problem, concretely<br>The VISION doc names the four compounding failure modes directly:<br>Tool explosion — a production agent may have 1,000+ tools; injecting all their schemas every turn is expensive and noisy.<br>Context bloat — raw outputs are large, sometimes sensitive, and unsafe to hand an LLM unfiltered.<br>Unsafe execution — with no authorization layer, an agent can call any tool with any arguments, with no auditable trail.<br>Flaky orchestration — ad-hoc chaining produces non-deterministic, hard-to-debug pipelines.<br>The gap weaver-spec targets isn't any one of these. It's that they're each solved in isolation, so the pieces never interoperate.<br>The idea in one paragraph<br>weaver-spec is documentation plus contracts — not a runtime library. It's a single source of truth for the vocabulary, invariants, and language-agnostic schemas that a set of independently adoptable agent components share: a routing layer (contextweaver), an execution layer (agent-kernel), and an orchestration layer (ChainWeaver). Define a RoutingDecision, a Frame, a CapabilityToken once, as JSON Schema and as Python dataclasses, and any implementation that honors them composes with any other — or with your own code at whatever boundary you happen to cross.<br>Press enter or click to view image in full size

The four compounding agent problems and which Weaver layer owns each.<br>How it works<br>The stack is three runtime layers, each owning exactly one responsibility and talking only through the shared contracts:<br>Routing ( contextweaver ) compiles context and presents the LLM a small, pre-screened set of ChoiceCards instead of an unbounded tool list. It emits a RoutingDecision. It never executes a tool and never sees raw output.<br>Execution ( agent-kernel ) validates a CapabilityToken, authorizes the call (PolicyDecision), runs the tool, and firewalls the raw result into a Frame (the LLM-safe view) plus an optional Handle (an opaque reference to the full artifact). It emits TraceEvents. It owns the only firewall.<br>Orchestration ( ChainWeaver ) runs deterministic DAG flows and delegates every tool step back to the execution layer — it never calls tools directly.<br>What makes this more than a diagram is that the boundaries are pinned down as seven non-negotiable invariants in INVARIANTS.md. The load-bearing one is I-01: the LLM never sees raw tool output by default. Everything has to pass through the firewall and become a Frame first. I-02 says every execution is preceded by a PolicyDecision and followed by a TraceEvent — there is no "silent" execution. I-06 says tokens must be scoped and either single-use or expiring. An implementation that breaks any of I-01 through I-07 is, by definition, not spec-compliant.<br>Press enter or click to view image in full size

A single request flowing through the stack, plus the closed learning loop that feeds the next one.<br>Get Diogo Santos’s stories in your inbox

Join Medium for free to get updates from this writer.

Subscribe

Subscribe

Remember me for faster sign in

Contracts come in two tiers. Core is minimal and stable — nine JSON Schemas (selectable_item, choice_card, routing_decision, capability, capability_token, policy_decision, frame, handle, trace_event). Changing one requires a major bump and an ADR. Extended (23 schemas at the time of writing) carries optional metadata — telemetry hints, UI hints, risk...

tool layer agent weaver execution never

Related Articles