Designing Loops That Prompt Coding Agents: The Six I Run

Designing Loops That Prompt Coding Agents: The Six I Actually Run | Cameron Westland

Designing Loops That Prompt Coding Agents: The Six I Actually Run

Jun 09, 2026

On June 7, Peter Steinberger tweeted: “Here’s your monthly reminder that you shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.” Eight million views. Around the same time, Boris Cherny, the creator of Claude Code, said on Acquired Unplugged: “I don’t prompt Claude anymore. I have loops running. They’re the ones prompting Claude and figuring out what to do. My job is to write loops.”

And then everyone lost their minds, because two of the most-watched people in this space said “write loops” and neither showed a loop.

I’ve been running loops for months, almost entirely as Codex app automations. This post is the concrete version: six loops, what each one does, and the actual configs, skills, and CI workflow behind them in a public snapshot repo: camwest/agent-skills. Every loop I run automates a feedback step I was already doing manually, and keeps me at the triage gate. Full autonomy isn’t the point.

Three things people mean by “loop”

Part of the uproar is that “loop” is at least three different things. There’s the autonomous task loop, “keep going until the work is done”: Geoffrey Huntley’s Ralph in its purest form (while :; do cat PROMPT.md | claude-code ; done), and its productized descendant, the /goal command that Codex and now Claude Code both ship. There’s the scheduled or event-driven loop, work that runs without you in the chair: the granddaddy is HEARTBEAT.md in OpenClaw, Steinberger’s own framework, a checklist the agent reconsiders every 30 minutes; its descendants are Codex automations and Claude Code routines. And there’s orchestration fan-out, dynamic workflows in Claude Code: MapReduce with agents, closer to the actor model than to any loop.

My read: Steinberger and Cherny are talking about the second kind, wired into the first. The loops below are scheduled and event-driven on the outside, and several of them run experiment-style inner loops once they wake up.

The gateway drug: babysitting PRs

I already had AI code review on every PR. Claude reviewing first, then Codex’s built-in review, and finally a custom GitHub Action because I wanted control over what it saw: the Action pulls in the full GitHub conversation context plus the patch and asks what the issues are.

So my actual workflow was: submit a PR from Claude Code or Codex, wait for the review to come back, then copy-paste the review text into the agent. Sometimes screenshots. One day I got annoyed enough to ask, “can’t you just use the gh client and read the review status yourself?” That worked, so the next iteration was obvious: “can’t you just keep checking and tell me when it’s done?” It’s a GitHub Action. It can be watched.

That request turned the review into a loop, and the loop is now a skill in our repo: graphite-pr-babysit. Submit a PR, say “babysit this,” and the agent watches the review run and CI, fixes what’s fixable, and comes back when it’s clean or stuck.

The profile of this loop generalizes: it watches for a state change in an external system, wakes on it, pulls in new context, analyzes, and does a job. The job ends at a triage decision we’ve been tuning as a team: accept the feedback, push back on it, or escalate to a human. Once I saw that shape, I started seeing places to apply it everywhere. The harnesses see it too: the Codex team ships a babysit-pr skill in their own repo, and Claude Code’s scheduled-tasks docs name babysitting a PR as a headline use case for /loop.

Inner loops: make the agent run the experiment

I wrote in March about pointing autoresearch at a slow Python path: 49 experiments in an hour, p95 from 339ms to 34ms, $24. That post was about Karpathy’s autoresearch concept and pi-autoresearch; the loop was “try an idea, measure it, keep what works, discard what doesn’t.”

The same shape now runs against a harder target: the behavior of the agent we ship in our product. When something goes wrong, a weird trace in Braintrust, user feedback in Slack, or I hit it myself, I kick off a worktree, paste in the trace, and invoke investos-runtime-test-loop. The skill forces a discipline that the model’s naive intuition fights against. Left alone, a model will hardcode “never do X, Y, Z” into the system prompt and overfit to the one trace you showed it. The skill’s references distill the research on why that fails, and the loop contract requires a hypothesis and a matrix: the original failing case, an adjacent positive that should take the same path, and a counterexample that should take a different...

Designing Loops That Prompt Coding Agents: The Six I Run

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs