An agent opened this pull request. Nobody asked it to. — Marco Ziccardi
Contact
Open Menu<br>Close Menu
Marco Ziccardi
Contact
Open Menu<br>Close Menu
An agent opened this pull request. Nobody asked it to.
22 Jun
Written By Marco Ziccardi
There is a version of the AI and engineering conversation that is pure hype, and a version that is pure caution. I am trying to live in neither. We use AI heavily at Voyfai, and over the last few months I have automated almost the entire path from a problem in production to a pull request that fixes it. A human still approves and merges at the end. Last week we merged our first fully autonomous pull requests: ones an agent opened on its own, from noticing the problem to writing the fix, with no human starting the work.<br>We have leaned on AI to help write code for a long time, like everyone has. The new part is not that an agent can write a fix. It is that nobody told it to. I want to be honest about why that human is there, because it is not the reason people usually give.
First, what the loop actually does<br>It runs in five stages.
It starts in observability. An agent reads our production telemetry in Datadog and looks for what changed: regressions, slow paths, error patterns, the kind of thing you would want a sharp on-call engineer to catch. It is not just matching on the word 'error.' It is looking for the shape of a problem.<br>When it finds something worth acting on, it writes it up as a Jira ticket, with the finding, the evidence, and enough context that the next stage can act on it.<br>A second agent picks up one of those tickets at a time, clones the repository fresh, reproduces the problem, and implements a fix in code. It opens a pull request the same way one of us would.<br>Then a third agent works through the review feedback on that pull request. Our setup already runs automated reviewers, Copilot and Codex, and humans leave comments too. The agent reads both, and for the comments that are clearly mechanical it commits the fix, one change at a time.<br>Then it stops, and a human approves and merges.<br>Here is the kind of problem it is built for. Latency on one of our endpoints starts creeping up after a deploy. Nothing pages yet, because it is still inside the limits, anomaly detection can detect that, but we want action. The first agent notices the drift and writes a ticket: which endpoint, when it started, how big the change is, and the deploy it lines up with. The second agent picks it up, reproduces it, finds an query that a recent change introduced, and opens a pull request that batches the call. Copilot flags a missing null check and the agent adds it. A human asks for a clearer variable name and the agent renames it. By the time an engineer opens that pull request, the diff is small, the description explains the cause, and the review comments are already handled. The only thing left is to decide whether the fix is right, and merge it. That whole path, from the first sign of drift to a reviewed pull request, happened without anyone being pulled away from what they were doing.<br>Now the honest part<br>That human at the end is not there because I believe a person must always have the final say. The story where the machine does the typing and the human keeps the noble judgment is becoming a comfortable one, and I do buy it up to a point. The human is there for two plainer reasons.<br>The first is that we are not ready to remove them yet. Trust in a system like this has to be earned with evidence, not granted because a demo looked good. A loop that opens pull requests against our production code can do real damage if it is confidently wrong, and confidently wrong is exactly what these systems are good at the moment you stop watching them. So we watch, and we are not yet at the point where I would stop.<br>The second reason is the one I find more interesting. Right now, the human reviewer is also our measurement. Every approval and every rejection is a data point on the question I actually care about: is this loop solving real problems, and solving them well enough that a person would have shipped the same change? What I watch is simple. Does the human merge it as it is, or do they push more commits on top first? When they reject it, why: was the diagnosis wrong, the fix wrong, or just not how we would have done it? How often does the loop open a pull request that goes nowhere? Those numbers tell me whether we are building something that genuinely solves problems, or something that produces plausible-looking pull requests that quietly waste everyone's time. Until that data gets boring, until the human is approving almost everything almost unchanged, we do not really know the loop is good. The reviewer is partly there to tell us when they are no longer needed.<br>So I do not think of that final human step as the destination. I think of it as scaffolding. The direction we are moving is to automate review itself and take engineers out of the business of reviewing routine changes...