~6 months of "agentic" coding | Ashutosh Sathe
👋">
So I was a complete skeptic of the whole “agentic” / “vibe” coding phenomenon before last year’s NeurIPS. While I was in California anyway, I decided to visit a bunch of friends in San Francisco. I knew SF was “big” on vibe coding, but I wasn’t aware of just how much seemingly everyone believed in it. Even some of my colleagues told me that the internal agentic coding harnesses had improved significantly and that they didn’t expect to be writing much code “by hand” in 2026.
I was still skeptical that agentic coding could be useful or relevant for my particular class of work. But I decided to try my friend’s Claude Code to forward-port a Minecraft performance optimization mod I had written ages ago and was disproportionately proud of. Not only did Claude oneshot the port, it also informed me that the mod was no longer necessary because the functionality had long since been incorporated into PaperMC.
Suddenly I felt very behind. Not in the “everyone is getting rich except me” sense, but in the sense that I might be dismissing a tool that had already become genuinely useful. I decided to give agentic coding a serious shot.
This blog is a reflection on roughly 6.5 months of using coding agents professionally and for hobby projects.
Background
Before getting into the details, a quick “who am I and what am I using this for?”
I work as a Research Engineer at Google DeepMind. Outside of work, I primarily write code for fun: game modifications, videogame tinkering, hobbyist ML projects and the occasional cursed side project.
During these six months, I used agentic coding primarily to iterate on and implement research ideas professionally. Outside of work, I also started playing Elden Ring (my first serious soulslike playthrough). I quickly decided that some parts of the experience could be improved.
In particular, I wanted a bespoke save mechanism. I got tired of applying exactly the same sequence of buffs outside a boss arena and repeatedly fighting through phase one just to get a few attempts at phase two.
“bUt tHis iSn’T wHaT mIyaZAki iNteNdeD”
Yes, I’m aware. I should probably experiment with different builds and embrace the suffering. I don’t have time for that. Even with my shortcuts, I still suffered plenty and had to spend an endless amount of time optimizing my build before finally beating Elden Beast a few weeks ago.
Since I was completely new to Elden Ring modding, I had no idea where to start. So I used coding agents to both brainstorm and implement the save system.
Most of my professional usage was in Python. Personal projects were a mix of Python, Rust and C++, though the final Elden Ring mod ended up being written in Rust.
First month
The first month was painful.
My expectations had been set sky high by Claude Code oneshotting the Minecraft mod. At work, I had access to an internal version of Antigravity (Google’s externally known agentic coding tool) that could use early Gemini release candidates.
I have a suite of internal tools that automate repetitive tasks. Think dumping results into spreadsheets, uploading logs to visualization UIs, generating reports, and so on.
At the time, I was investigating a relatively new problem and wanted to try several low-hanging interventions. The task was technically fairly simple: it would need the agent to just plumb 5 of my existing tools and read out the results.
Armed with a LOT of excitement, I pointed Antigravity at my tooling, described what I wanted, and pressed Enter.
The anticipation went through the roof and immediately into the floor.
The agent did something unbelievably stupid.
Okay, fine. Bad prompt. Let me be more explicit.
Same failure.
After several more attempts, I explicitly described the exact five steps I would have performed manually. The agent still failed the final step.
After that experience, I dramatically recalibrated my expectations.
I reduced agentic coding to a glorified implementation engine:
I’ve already written the design doc. Please turn it into code.
However, then the design docs became extremely verbose and detailed. Sometimes the exploratory utilities I was building had so little reuse value that writing the design doc took longer than implementing them myself.
Still, it was a relatively quiet period. People were on vacation, priorities for the next year were still being finalized, and I had the luxury of being inefficient.
The generated code was usually functional. It was also often terrible.
The models loved inventing bespoke serialization structures that should never have existed. They would create file-specific abstractions, unnecessary helper classes, and layers of indirection that solved problems nobody actually had.
This wasn’t unique to Gemini. Claude frequently behaved the same way in my...