Lessons from my overly-introspective, self-improving coding agent

Lessons from my overly-introspective, self-improving coding agent | ngrok blog

Skip to main contentSearch…Control⌃KNewsletterRSS

A year or two ago, everyone was building coding agents. Now everyone's building coding agents that modify themselves... and I wanted to join the fun and ask:

What happens when you tell a coding agent to think about what it's done and do better next time?

So, I built bmo: a self-improving coding agent, and then used it (almost) exclusively as my coding agent for two weeks. It's been wildly nifty to me—like, take me back to tearing apart the family computer's partition to install Debian from a CD that came in the back of some book my friend bought at Borders Books kind of novel and nifty—and is exposing a joy of computing that I haven't felt in quite a while.

Here's what I found.

A preamble on bmo's bootstraps

I wanted to design an agent harness on the principle of immediate action.

That starts with a basic agentic loop and access to three tools: run_command, load_skill, and reload_tools. I'd built other coding agents in the past and gave them access to more specific tools like write_file and list_cwd, but I've found that coding agents really only need access to shell commands to work as expected. I also wanted to give bmo a challenge: Instead of using run_command "fresh" with every session, I wanted to see how it could optimize its own "harnesses" for safe and efficient use of common Linux tools.

Self-improvement happens across four loops. The first is a build it now directive that interrupts the task to build tools immediately, add it to a hot-reloadable library, and use it right away. The second is active learning capture, logging corrections and preferences. The third is self-reflection at session end. The fourth is the battery change every 10 sessions, where bmo says, hey. i need to change my batteries, ok? one sec..., analyzes those 10 sessions, identifies opportunities, and builds improvements from the backlog.

┌──────────────────┐ │ User request │ └────────┬─────────┘ ┌─────────────────────────────────────────────────────────────────────┐ │ ACTIVE SESSION │ │ │ │ ┌─────────────┐ friction? ┌──────────────────┐ │ │ │ Execute │───────yes──────▶│ 1. BUILD IT NOW │ │ │ │ the task │ │ Build tool │ │ │ │ │◀────continue────│ Hot-reload │ │ │ └──────┬──────┘ │ Validate │ │ │ │ └──────────────────┘ │ │ │ correction? │ │ │ preference? ┌──────────────────┐ │ │ └────────yes────────────▶│ 2. ACTIVE │ │ │ │ LEARNING │──▶ session log │ │ └──────────────────┘ │ └────────┬────────────────────────────────────────────────────────────┘ │ session ends ┌──────────────────────┐ ┌───────────────────────┐ │ 3. SELF-REFLECTION │ │ 4. BATTERY CHANGE │ │ What went well? │ every 10 sessions │ Analyze sessions │ │ What was slow? │───────────────────▶│ Update WORKING_ │ │ Next time? │ │ MEMORY.md │ └────────┬─────────────┘ │ Build from │ │ │ OPPORTUNITIES.md │ │ session log └───────────┬───────────┘ │ │ ▼ tools, skills │ I had wanted to start with only the build it now loop, but everything else became necessary after many long conversations with bmo and some hard-won lessons. On that note—

What bmo learned

In our time together, bmo went through 8 maintenance passes and nearly 100 active sessions across multiple systems, which resulted in 11 new tools and 7 skills. I used bmo and its tools for everything: building parts of the new ngrok.com website, writing shell scripts for my dotfiles, scaffolding a new Astro site, debugging AMD graphics driver crashes, the whole kit and caboodle. It really has been my daily driver.

Knowing something isn't the same as doing it

Early on, bmo and I worked on a learning-event-capture skill designed for recognizing when I express corrections and personal preferences, or when bmo itself noticed a pattern worth saving. A truncated version is below, but you can see the whole skill in bmo's repo.

Copy code1# Learning Event Capture23## When to Use4Continuously during every session. Learning events are corrections, preferences,5or patterns that should inform future behavior.67## Recognition Cues89### Corrections (type: "correction")10- User says "no", "not that", "wrong", "actually..."11- User repeats an instruction you missed12- User undoes something you did13- User expresses frustration or disappointment14- User provides the correct answer after your attempt1516### Preferences (type: "preference")17- User specifies a style choice ("use TypeScript", "keep it concise")18- User chooses between options you offered19- User describes their workflow or habits20- User says "I always...", "I prefer...", "I like..."2122### Patterns (type: "pattern")23- User does the same type of task repeatedly24- User follows a consistent workflow shape25- You notice a recurring problem type or domain2627## Best Practices28291. **Log immediately when you detect a cue**30 - Call `log_learning_event` right away, don't wait for session end31 - Include specific context (what task, what...

Lessons from my overly-introspective, self-improving coding agent

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

German ruling declares Google liable for false answers in AI Overviews

Britain Became as Poor as Mississippi