The Mathi Problem

The Mathi Problem: Why AI-Generated Features Ship Fast and Break SlowWork Services Our Approach About Blog Careers

Let's Talk

Development

The Mathi Problem: Why AI-Generated Features Ship Fast and Break Slow Nicolas HernandezJune 25, 202613 min read

Part of the Xmartlabs AI Journey series. Previously: Designing IA Level Up.

TL;DR

One of our team members shipped a feature in half a day using AI-assisted development. Then, I spent three days fixing the bugs it introduced. This is not an outlier; it is the most common failure mode of AI-assisted development. The good news is that it is entirely preventable. This post breaks down what happened, why the industry data is alarming, and the concrete prevention framework we now apply to every AI-assisted task.

The Incident

Summary

A developer on our team (Mathi) used AI tooling to implement a feature. The AI-generated working code fast. By midday, the feature was done: implemented, apparently functional, ready for review. What followed was three days of debugging, subtle bugs, edge cases the AI had not considered, and logic errors that only surfaced under real conditions.

Timeline

What happened

The AI did what it was asked to do. It produced code that compiled, ran, and appeared to work for the primary use case. But it lacked understanding of the broader system context: edge cases, existing patterns, and implicit requirements that an experienced developer would have internalized.

This is not a story about blame. Mathi did nothing wrong; this is what happens when a capable tool meets a workflow that was not designed for it. The AI wrote code the way a confident but inexperienced engineer would: quickly, plausibly, and without the judgment that comes from maintaining a codebase over months.

The ratio tells the story: half a day building, three days fixing. A 6:1 ratio of debugging to development. The feature was not faster. It was faster to appear done.

Why This Keeps Happening

Our experience is not unique. The industry data on AI-generated code quality is sobering:

~45% of AI-generated code contains security vulnerabilities. Veracode's 2025 GenAI Code Security Report tested 100+ LLMs across 80 code completion tasks and found that AI models chose the insecure coding path 45% of the time.

1.7x more defects in AI-generated code compared to human-written code. CodeRabbit's analysis of 470 open-source PRs found that AI-authored code averaged 10.83 issues per PR, compared with 6.45 for human-only code.

Pull requests are 18% larger when developers use AI coding assistants, according to a Jellyfish study tracking millions of PRs across 500+ companies.

Delivery stability decreases measurably with AI adoption. Google's DORA 2024 report found that a 25% increase in AI adoption was associated with a 7.2% decrease in delivery stability, and its 2025 follow-up confirmed the trend hasn't reversed.

These numbers paint a grim picture. If you stopped reading here, you would conclude that AI coding assistants make things worse, not better.

The "But" That Changes Everything

Here is the critical detail that most coverage of these statistics omits: these numbers come from teams using AI tools without proper configuration.

No project-level instructions. No verification workflows. No constraints on what the AI should or should not do. No structured review process for AI-generated output.

In other words, these teams gave an eager but inexperienced assistant the keys to the codebase and said, "Go.” No onboarding, no guardrails, no code review process adapted for AI output.

That is the environment that produces a 1.7x defect rate. That is the environment where incidents spike. And that was, honestly, the environment we were operating in when the Mathi Problem happened.

The question is not "should we use AI for coding?", that ship has sailed. The question is: how do we configure the environment so AI-generated code meets the same standards as human-written code?

The Root Cause

During our IA Level Up training program, we landed on a mental model that reframed everything:

Think of AI as a junior developer, fast, capable, and without judgment.

This is not dismissive. Junior developers are valuable. They learn fast, produce volume, and, given the right mentorship and review process, ship good work. But nobody gives a junior developer unsupervised access to production on their first week.

The problem with the Mathi Problem was not that AI wrote bad code. The problem was that the AI operated without:

Project context: It did not know our conventions, architectural patterns, or existing abstractions.

Constraints: It was not told what not to do. It did not know our testing requirements, our security considerations, or our deployment patterns.

A review framework: The code review process was optimized for human-authored code, not for the specific failure modes of AI-generated code.

The root cause was not AI. It was the absence of a verification and prevention framework...

The Mathi Problem

Related Articles

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

Apertus – Open Foundation Model for Sovereign AI

How to Earn a Billion Dollars