I use LLMs as a staff engineer in 2026

How I use LLMs as a staff engineer in 2026A bit over a year ago I wrote How I use LLMs as a staff engineer. Here’s a brief summary of what I used AI for last year:

Smart autocomplete with Copilot

Short tactical changes in areas I don’t know well (always reviewed by a SME)

Writing lots of use-once-and-throwaway research code

Asking lots of questions to learn about new topics (e.g. the Unity game engine)

Last-resort bugfixes, just in case it can figure it out immediately

Big-picture proofreading for long-form English communication

Here are some tasks I explicitly didn’t use AI for last year:

Writing whole PRs for me in areas I’m familiar with

Writing ADRs or other technical communications

Research in large codebases and finding out how things are done

February 2025 was a long time ago. Back then the best model was the first reasoning model, OpenAI’s o1. Agents sort of worked, but would often get stuck or thrown off by compaction. What’s changed since then?

Agents are good now

The biggest change is that I now use LLMs to produce entire PRs in areas I’m familiar with . A year ago I would very occasionally ask an agent to make changes to a single file if it was a simple change I couldn’t be bothered typing out. Sometimes I would copy a function I wrote into a LLM chat window for feedback. But now I start every single change by asking an agent to solve the problem, and usually push the PR after a single editing pass.

In late 2025 I used a lot of open VSCode windows. In early 2026, that changed to terminal tabs with the Copilot CLI, particularly when I needed to make changes across multiple repos at the same time. Now I use the GitHub Copilot app a lot (tens of sessions per day).

This reflects a shift from having to line-edit the agent basically as it went to only doing an editing pass right at the end. Early agents would go wrong a lot and not be able to recover, so it was valuable to keep an eye on their thought processes and step in to pause them and set them right. In my experience, current agents move too fast to do this, and recover their own mistakes most of the time anyway.

Sometimes I don’t even need to make edits and I can just push the change as-is, though this is rare: if nothing else, I typically go through and remove some of the over-commenting and other LLM-isms.

I do a lot of skimming through and evaluating agent changes. Most of the time I reject them entirely, just based on “eh, that’s not what I was thinking”. On average it takes me about thirty seconds to make this initial assessment. If the change looks alright after that, I’ll dig in and do a proper review to make sure I understand it and it’s doing the right thing. For difficult tasks, I’ll often reject five or six (or more!) agent attempts before accepting one as good enough to work with, or giving up and making the change by hand.

Investigating bugs

I rely on LLMs even more for bug-hunting than I do for making changes. In 2025, I used to throw the occasional bug at a LLM, just in case it was able to rapidly come up with an explanation. Now I throw every bug at a LLM (typically by opening a new agent session and pasting in the bug report), because it’s able to correctly diagnose 80% of issues on its own. Current agents are really good at chasing down bugs, particularly when you give them a vantage point across multiple repositories.

I’m still better at it. Just last week I had a tricky bug that took about fourteen agent sessions before one finally figured it out. What was I doing in between and around those sessions?

Digging up extra context on the bug (from logs, Slack, etc) and reporting it to the agents

Building my own mental model of the problem, of course

Setting up my own reproduction of the bug (in parallel with the agents’ efforts)

Responding to agent sessions with “no, your theory can’t be right because of X” (or just killing and restarting the session with that extra hint)

Ultimately an agent was the one to catch the bug. But I still count it as my find, because by that point I had narrowed the search space tightly enough that agent session #14 had a significantly easier problem to solve than agent session #1. In other words, human expertise still matters a lot for investigating bugs .

Writing

I almost always write my own PR descriptions, since LLMs over-communicate and are bad at expressing the “core idea” behind a change. Writing the PR description by hand also signals to reviewers that I’ve reviewed the change myself, and I’m not asking them to be the first human to read the diff. The only time when I don’t write the PR description is when the change is trivial and the agent-generated description is one sentence. At that point I just leave it alone.

I still don’t use LLMs to write Slack messages, ADRs, issues and so forth. I believe I have a better sense of what’s important to communicate, and I want to signal that there’s a human being thinking about the content.

I still never use LLMs to write...

I use LLMs as a staff engineer in 2026

Related Articles

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play

Old Reddit Is Down

The ultimate female fantasy – A feminist critique of Beauty and the Beast