Thoughts on starting new projects with LLM agents

Thoughts on starting new projects with LLM agents - Eli Bendersky's website

Toggle navigation

Eli Bendersky's website

About

Projects

Archives

A few months ago I wrote about using LLM agents to help restructuring one of my Python projects. It's worth beginning by saying that the rewrite has been successful by all reasonable measures; I've been able to continue maintaining that project since then without an issue.

In this post, I want to discuss another project I've recently completed with significant help from agents: watgo. In this project many things are different; most notably, it's a from-scratch project rather than a rewrite, and it uses a different programming language (Go). This post describes my experience working on the project, and some lessons learned along the way.

The process

This is a new project, so it required extensive design. I began by iterating on the design with the agent, with a sketch of the API. For this purpose, I recommend using a Markdown file committed into the repository for future reference.

After that, I started asking the agent to write CLs [1] in a logical order that made sense to me, keeping them small and reviewable (more on this in the next section). Sometimes it's not easy to have a small CL, and multiple rounds of revision may confuse the agent; in this case, I commit the CL and then go back and ask the agent to modify or refactor the code, as much as needed, with separate CLs. In the worst case, the whole sequence can be reverted if I feel we've taken the wrong direction (branches could also be helpful here for more complicated scenarios).

This point is worth reiterating: sometimes a single CL is a huge step forward, but requires lots of review, cleanup and refactoring to be viable. I've had multiple instances where an agent produced several days of work in a single CL, but I then spent hours instructing it to clean up and refactor. Overall, it's still a productivity gain, just not as much as some pundits would like us to believe.

Keeping the human in the loop

Given the current state of agent capabilities, I think it's worth splitting projects into two categories:

Low importance / prototype / throw away projects where deep code understanding is unnecessary. These can be "vibe-coded" (submitting agent code without even reviewing it).

High importance projects that I actually want to maintain; here, vibe-coding is ill advised and I insist on reviewing and guiding all code the agent writes before it's submitted (or shortly after, as discussed above).

The watgo projects is a clear example of (2): I certainly intend to maintain this project in the long term, so I insist on code that I understand. With very few exceptions, no code gets in without full review and often multiple rounds of revisions.

Even if the cost for writing code went down, maintaining a project is so much more than that. It's triaging and fixing bugs, it's thinking through what needs to be done rather than how to do it, it's keeping the code healthy over time, and so on. As Brian Kernighan said:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you're as clever as you can be when you write it, how will you ever debug it?

Maybe at some point agents will become good enough that projects in category (2) can be implemented and maintained completely autonomously. Maybe. But we're certainly not there yet. My hunch is that getting there will require crossing the AGI line [2], after which little in our world remains certain.

Practical workflow

If you're using an agent to send an actual PR and only review that, it's difficult to be disciplined enough to actually perform a thorough review. I find the following method to be more reliable:

I use a CLI agent running locally in my repository, and ask it to update the code there. In parallel, I have a VSCode window open in the same project, where I can:

Review the agent's changes using VSCode's diff view

Make my own tweaks and code changes if needed

Once I'm pleased with the change, I manually create a commit.

Keeping the CLs small

As mentioned above, it's imperative to keep making progress in small chunks, with small enough CLs that a human can fully understand in a single review. It's very tempting to sprint ahead submitting thousands of lines of code every day, but this temptation has to be avoided. Coding with an agent is like speed-reading; yes, you're making more progress, but comprehension suffers the faster you go.

Particularly for refactoring, agents still take the shortest route to destination. It's important to guide them to think about the "big picture" at all times, find all instances where X is better done as Y, not just a single place noticed during a review. This is why it's sometimes OK to have a CL submitted before you fully agree with everything, and go back to it later for several refactoring rounds. Source control works amazingly well when pair-coding with agents.

Testing...

Thoughts on starting new projects with LLM agents

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy