Thoughts on starting new projects with LLM agents - Eli Bendersky's website
Toggle navigation
Eli Bendersky's website
About
Projects
Archives
A few months ago I wrote about using LLM agents to help restructuring one of my<br>Python projects.<br>It's worth beginning by saying that the<br>rewrite has been successful by all reasonable measures; I've been able to<br>continue maintaining that project since then without an issue.
In this post, I want to discuss another project I've recently completed with<br>significant help from agents: watgo. In<br>this project many things are different; most notably, it's a from-scratch<br>project rather than a rewrite, and it uses a different programming language<br>(Go). This post describes my experience working on the project, and some lessons<br>learned along the way.
The process
This is a new project, so it required extensive design. I began by iterating on<br>the design with the agent, with a sketch of the API. For this purpose, I<br>recommend using a Markdown file committed into the repository<br>for future reference.
After that, I started asking the agent to write CLs [1] in a logical order that<br>made sense to me, keeping them small<br>and reviewable (more on this in the next section). Sometimes it's not easy to<br>have a small CL, and multiple rounds of revision may confuse the agent;<br>in this case, I commit the CL and then go back and ask the agent to modify<br>or refactor the code, as much as needed, with separate CLs. In the worst case,<br>the whole sequence can be reverted if I feel we've taken the wrong direction<br>(branches could also be helpful here for more complicated scenarios).
This point is worth reiterating: sometimes a single CL is a huge step forward,<br>but requires lots of review, cleanup and refactoring to be viable. I've had<br>multiple instances where an agent produced several days of work in a single<br>CL, but I then spent hours instructing it to clean up and refactor. Overall,<br>it's still a productivity gain, just not as much as some pundits would like us<br>to believe.
Keeping the human in the loop
Given the current state of agent capabilities, I think it's worth splitting<br>projects into two categories:
Low importance / prototype / throw away projects where deep code<br>understanding is unnecessary. These can be "vibe-coded" (submitting agent<br>code without even reviewing it).
High importance projects that I actually want to maintain; here, vibe-coding<br>is ill advised and I insist on reviewing and guiding all code the agent<br>writes before it's submitted (or shortly after, as discussed above).
The watgo projects is a clear example of (2): I certainly intend to maintain<br>this project in the long term, so I insist on code that I understand. With very<br>few exceptions, no code gets in without full review and often multiple rounds<br>of revisions.
Even if the cost for writing code went down, maintaining a project is so much<br>more than that. It's triaging and fixing bugs, it's thinking through what needs<br>to be done rather than how to do it, it's keeping the code healthy over time,<br>and so on. As Brian Kernighan said:
Everyone knows that debugging is twice as hard as writing a program in the<br>first place. So if you're as clever as you can be when you write it, how will<br>you ever debug it?
Maybe at some point agents will become good enough that projects in category<br>(2) can be implemented and maintained completely autonomously. Maybe. But<br>we're certainly not there yet. My hunch is that getting there will require<br>crossing the AGI line [2], after which little in our world remains certain.
Practical workflow
If you're using an agent to send an actual PR and only review that, it's<br>difficult to be disciplined enough to actually perform a thorough review. I find<br>the following method to be more reliable:
I use a CLI agent running locally in my repository, and ask it to update the<br>code there. In parallel, I have a VSCode window open in the same project, where<br>I can:
Review the agent's changes using VSCode's diff view
Make my own tweaks and code changes if needed
Once I'm pleased with the change, I manually create a commit.
Keeping the CLs small
As mentioned above, it's imperative to keep making progress in small chunks,<br>with small enough CLs that a human can fully understand in a single review. It's<br>very tempting to sprint ahead submitting thousands of lines of code every day,<br>but this temptation has to be avoided. Coding with an agent is like<br>speed-reading; yes, you're making more progress, but comprehension suffers<br>the faster you go.
Particularly for refactoring, agents still take the shortest route to<br>destination. It's important to guide them to think about the "big picture" at<br>all times, find all instances where X is better done as Y, not just a single<br>place noticed during a review. This is why it's sometimes OK to have<br>a CL submitted before you fully agree with everything, and go back to it later<br>for several refactoring rounds. Source control works amazingly well when<br>pair-coding with agents.
Testing...