A month of vibe-coding at 0.01x velocity - webesque
Exchanging my data for service use, I took up OpenAI's offer for a free<br>trial of ChatGPT Plus for a month. After slowly vibecoding an IDE plugin<br>throughout last month, I'm eager to share my notes and lingering thoughts.
Marius Ghița
19th of June, 2026
This is not the first time I've tried a fully<br>vibecoded<br>project. Last year had me<br>return multiple times to different providers to form my own opinion given the disparity<br>of public opinion. I never walked away before with the feeling that coding is solved.
What follows is a rundown of my experience over the course of the month, followed by<br>my takeaways.
The project, agent operation, and disclosure of preconceptions
Selected a project that was unreasonable for a limited timeframe, given my skills.<br>A plugin for the IntelliJ IDE platform, written in Kotlin. A new domain, and a<br>language I don't use. Small in scope, greenfield project, with<br>no entrenched domain knowledge-checks, for the model, or for me.
As is customary across the industry to build coding harnesses the plugin is another LLM<br>integration plugin, in
the sea of LLM plugins. In few words, a plugin that works well with locally hosted<br>models, using them as ad-hoc fill-in-the-middle capable models, and never<br>proactively interrupting the flow of programming.<br>See the Alternatively prompt GitHub<br>repository for more information.
I believe that coding<br>agents<br>are best used as tools to generate prototypes.<br>Attempting to solve problems that<br>are outside the area of expertise of the operator. A similar belief concerns<br>the practical applications of general purpose LLMs for business integrations.
The setup, and the agent operation
Ask ten people how to best operate an<br>agent<br>and you'll get eleven answers. Maybe<br>you'll also receive suggestions to run agents that run agents which check agents that<br>control agents that actually code. If AI vibe-coding ultimately is the future, I'd like to<br>prompt the way I code,
lazily.
I've kept things simple though not optimal.
And if there is an optimal, companies that<br>are a bit too well funded (pre-IPO) should be the ones enlightening us on the one true way. ™
Best in class model.
Unless limited by the tool (planning mode), always ran GPT-5.5 xhigh (extra high reasoning).
Planning for larger features.
The project had three main project features, and several functional rewrites. Most, if not<br>all, of these went through the general plan then execute flow. In general improvements<br>that only consisted of a single short sentence prompt (change color, add accessibility label,<br>fix prompt for e2e test, etc) skipped the plan phase.
No agent-specific project adjustments beforehand.
No AGENTS.md<br>or equivalent markdown files. Whatever was in the upstream base template repository used<br>as is without review. Later in the process, when the agent forgot how to run tests, was instructed to<br>generate an AGENTS.md file.
The weekly limits, and the effective daily limits
On the Plus plan, as expected, the quotas become obvious quickly.<br>Their terms of service nowadays state those quotas are based on token usage, but when full allocation<br>and utilization are not visible throughout the interface, how do you gauge how long you can work<br>in a session?
Progress wasn't steady every day. Some days I would be fixing an issue or two, and due to a more<br>extensive e2e run I would be running out of requests pretty quickly. I assume in part due to<br>the internal looping of<br>codex<br>to check if processes have finished. If every time it checked that<br>a long running e2e test<br>wasn't done, and if it sent the entire context with each request,<br>I can only imagine how much useless computation it could chew through.
Surprising model behaviour when close to the usage limit.
I quickly learned that I should avoid any meaningful request when closing in on the 5 hour limit.<br>While in the past I've seen model behaviour in which a response would be cut off abruptly (my<br>experience last year when trialing JetBrains' Junie<br>agent),<br>with codex<br>I felt shortchanged.<br>Instead of producing a smaller part due the constraints, the model didn't stray from the<br>prompt but made undesirable shortcuts. Instead of adjusting an end-to-end test as requested,<br>it hardcoded asserts for tests to pass. Behaviour unobserved outside of this scenario.
The weekly limits most<br>of the time would reset sooner than what was advertised in the tool.
I could draft up<br>my own theory on why the dates and times were out of sync, but you won't hear me complain<br>loudly, as this allowed me to work more on this project more often. Good reminder<br>that you can have billions of dollars in funding and still make the most basic mistakes<br>even with PhD-level intelligence at your disposal.
Model degradation during peak hours?
It is a known fact that AI labs are capacity constrained, though how that affects day to day<br>usage remains mostly a guess. In practice what I've noticed, was that around the time US East...