Thoughts on starting new projects with LLM agents

wrxd2 pts0 comments

Thoughts on starting new projects with LLM agents - Eli Bendersky's website

Toggle navigation

Eli Bendersky's website

About

Projects

Archives

A few months ago I wrote about using LLM agents to help restructuring one of my<br>Python projects.<br>It's worth beginning by saying that the<br>rewrite has been successful by all reasonable measures; I've been able to<br>continue maintaining that project since then without an issue.

In this post, I want to discuss another project I've recently completed with<br>significant help from agents: watgo. In<br>this project many things are different; most notably, it's a from-scratch<br>project rather than a rewrite, and it uses a different programming language<br>(Go). This post describes my experience working on the project, and some lessons<br>learned along the way.

The process

This is a new project, so it required extensive design. I began by iterating on<br>the design with the agent, with a sketch of the API. For this purpose, I<br>recommend using a Markdown file committed into the repository<br>for future reference.

After that, I started asking the agent to write CLs [1] in a logical order that<br>made sense to me, keeping them small<br>and reviewable (more on this in the next section). Sometimes it's not easy to<br>have a small CL, and multiple rounds of revision may confuse the agent;<br>in this case, I commit the CL and then go back and ask the agent to modify<br>or refactor the code, as much as needed, with separate CLs. In the worst case,<br>the whole sequence can be reverted if I feel we've taken the wrong direction<br>(branches could also be helpful here for more complicated scenarios).

This point is worth reiterating: sometimes a single CL is a huge step forward,<br>but requires lots of review, cleanup and refactoring to be viable. I've had<br>multiple instances where an agent produced several days of work in a single<br>CL, but I then spent hours instructing it to clean up and refactor. Overall,<br>it's still a productivity gain, just not as much as some pundits would like us<br>to believe.

Keeping the human in the loop

Given the current state of agent capabilities, I think it's worth splitting<br>projects into two categories:

Low importance / prototype / throw away projects where deep code<br>understanding is unnecessary. These can be "vibe-coded" (submitting agent<br>code without even reviewing it).

High importance projects that I actually want to maintain; here, vibe-coding<br>is ill advised and I insist on reviewing and guiding all code the agent<br>writes before it's submitted (or shortly after, as discussed above).

The watgo projects is a clear example of (2): I certainly intend to maintain<br>this project in the long term, so I insist on code that I understand. With very<br>few exceptions, no code gets in without full review and often multiple rounds<br>of revisions.

Even if the cost for writing code went down, maintaining a project is so much<br>more than that. It's triaging and fixing bugs, it's thinking through what needs<br>to be done rather than how to do it, it's keeping the code healthy over time,<br>and so on. As Brian Kernighan said:

Everyone knows that debugging is twice as hard as writing a program in the<br>first place. So if you're as clever as you can be when you write it, how will<br>you ever debug it?

Maybe at some point agents will become good enough that projects in category<br>(2) can be implemented and maintained completely autonomously. Maybe. But<br>we're certainly not there yet. My hunch is that getting there will require<br>crossing the AGI line [2], after which little in our world remains certain.

Practical workflow

If you're using an agent to send an actual PR and only review that, it's<br>difficult to be disciplined enough to actually perform a thorough review. I find<br>the following method to be more reliable:

I use a CLI agent running locally in my repository, and ask it to update the<br>code there. In parallel, I have a VSCode window open in the same project, where<br>I can:

Review the agent's changes using VSCode's diff view

Make my own tweaks and code changes if needed

Once I'm pleased with the change, I manually create a commit.

Keeping the CLs small

As mentioned above, it's imperative to keep making progress in small chunks,<br>with small enough CLs that a human can fully understand in a single review. It's<br>very tempting to sprint ahead submitting thousands of lines of code every day,<br>but this temptation has to be avoided. Coding with an agent is like<br>speed-reading; yes, you're making more progress, but comprehension suffers<br>the faster you go.

Particularly for refactoring, agents still take the shortest route to<br>destination. It's important to guide them to think about the "big picture" at<br>all times, find all instances where X is better done as Y, not just a single<br>place noticed during a review. This is why it's sometimes OK to have<br>a CL submitted before you fully agree with everything, and go back to it later<br>for several refactoring rounds. Source control works amazingly well when<br>pair-coding with agents.

Testing...

agent code projects project agents review

Related Articles