Tools Are Harness Too

Tools Are Harness Too. I had a rule in my project CLAUDE.md… | by Ian Johnson | Jun, 2026 | MediumSitemapOpen in appSign up Sign in

Medium Logo

Get app Write

Ian Johnson

8 min read· Just now

Listen

I had a rule in my project CLAUDE.md for months that said always check Code Health before committing. The agent followed it about 70% of the time. The other 30%, it forgot; usually on the longer tasks, when its context window was filling up and the rules at the start of the prompt were losing weight against the code at the end. I added emphasis. The 70% became 80%. I added a section break. 85%. I rewrote the rule in capital letters with three exclamation points. Same 85%. Then I gave the agent a Code Health tool. Not a rule. A tool — an MCP server entry point called code_health_safeguard that returned a structured analysis of the staged diff. The next day, the agent ran it before every commit. The day after that. The day after that. The compliance went to 100% and stayed there. The difference between a rule and a tool is the difference between telling the agent what to do and making the right thing easier than the wrong thing. Both belong in a working harness. Tools are the half nobody talks about. Rules tell, tools enforce A rule is a paragraph. The agent reads it, weighs it against the other things in its context, and produces output. If the rule wins the weighing, the agent follows it. If something else wins (the code’s existing pattern, a more recently mentioned constraint, the user’s apparent intent in the prompt) the rule loses. The probability the rule wins is not 100%. It is some number less than 100%, and the number drops as the context grows. Rules in long-running agent sessions are like introductory remarks at a long meeting: present, recorded, mostly forgotten by hour three. A tool is different. A tool is an affordance: a thing the agent can do, exposed at the same level as everything else the agent does. The agent’s working surface is I can read files, edit files, run shell commands, and call these specific functions. A tool slots into that list. The agent doesn’t have to weigh a paragraph against another paragraph; it sees a function and decides whether to call it. The decision shifts from should I obey this rule to should I use this capability. The shift matters because the decision happens at a different layer in the agent’s reasoning. Capability decisions are cheaper than rule-weighing decisions. They survive the long-context drift. When a tool beats a rule Not every rule should become a tool. The translation has a cost. Tools take time to write, they introduce new surface area to maintain, and they only earn their place when they replace a rule that fails often enough to be worth replacing. A rule belongs as a tool when: The rule produces a check the agent can run. Always verify the schema before writing a migration is a checkable property. Be thoughtful about naming is not. The check has a deterministic output. Run the linter and pass is a yes-or-no answer. Match the team’s style is a vibes call. The check fails silently when skipped. A rule the agent can ignore without anyone noticing is a rule that begs to become a tool. The tool is the gate that closes the silent skip. The check is run often. A check the agent should run on every commit is a tool. A check the agent should run on the rare cross-project migration is a skill or a rule. For everything else, rules are still the right unit. Tone, naming conventions, design preferences, when-to-use-X-vs-Y judgment calls: those are paragraphs, not function calls. Trying to encode them as tools is the path to a brittle harness with thirty checks and no taste. The Code Health example, end to end To make the trade-off concrete, here’s the rule-to-tool conversion that started this post. The rule, in CLAUDE.md: ## Code Health is authoritative

Before suggesting a commit, run a Code Health check on the changed files. If any file regresses below 9.0, refactor before committing. Do not commit code with a Code Health below the project floor.The rule is fine. It’s specific, it names the threshold, it tells the agent what to do. It just doesn’t fire reliably under load. The tool is the CodeScene MCP server’s pre_commit_code_health_safeguard. It exposes itself to the agent as a callable function. The function takes the staged diff and returns a structured response — files checked, scores before and after, regressions flagged, suggested refactorings. The agent doesn’t have to remember to do the right thing; the right thing is one tool call away, and not calling it now looks conspicuous next to the call to git diff --cached that the agent makes anyway. The rule still lives in CLAUDE.md, with one important change. It now says call pre_commit_code_health_safeguard before suggesting a commit. The rule names the tool. The tool does the work. The rule is shorter and the agent’s compliance is higher. That’s...

Tools Are Harness Too

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org