My AI-built PHP engine in Rust passes 17% of PHP-src tests, renders WordPress

I Don't Know Rust. My AI Is Rewriting PHP in It Anyway. - ekinertac← essaysI Don’t Know Rust. My AI Is Rewriting PHP in It Anyway. A few nights ago I watched my terminal print out a 26 KB WordPress front page — Phargo Test Site, the block-library CSS, “Hello world!” pulled from a SQLite database, and a clean at the bottom. Completely unremarkable output, except for one detail: the PHP engine that served it contains zero lines of PHP’s actual source code . It’s a from-scratch interpreter written in Rust. Here’s the part I need you to sit with: I don’t know Rust. I have never written a lexer. I could not explain to you what a “tree-walking evaluator” is without reading the Wikipedia article in another tab. If you cornered me at a party and asked how PHP’s garbage collector works, I would fake a phone call. The engine is called Phargo, and my contribution to it is, roughly, aiming. An AI writes the code. I point it at a target, read what comes back like a medieval king reviewing naval charts — solemn nod, zero comprehension — and type the most powerful phrase in modern software development: “looks good, continue.” The Experiment: Radical Honesty as a Build System Everyone and their houseplant has an AI-built project right now, and every single one comes with the same unfalsifiable claim: “it works!” Works according to whom? The AI that wrote it? The demo that was recorded on the fourth take? So the whole experiment rests on one idea, borrowed from watching the Bun team drive their JavaScript runtime against real-world test suites: don’t let the AI grade its own homework. PHP ships with its own test suite — about 22,000 .phpt files written by the PHP internals team over three decades. Tests I didn’t write. Tests the AI didn’t write. Tests that encode every cursed corner of the language, from DateTime daylight-saving math to what exactly var_dump() prints for a float. That suite is the oracle. The scoreboard runs all of it, and the pass rate is auto-generated into the repo after every run. The number cannot be flattered, negotiated with, or prompted into a better mood. Either bug40261.phpt passes or it doesn’t. Current score: 3,844 of 22,037 — 17.4% of the entire upstream PHP test suite. And before you snort-laugh at 17%: the realistic ceiling is around 40–45%, because the rest of the suite tests C extensions (GD, curl, SOAP, intl, MySQL drivers…) that are explicitly out of scope. Within the actual playing field, the climb is very real — it started at zero. My loop as the human is almost embarrassingly thin: The AI runs a failure histogram over the whole corpus to find the biggest cluster of failing tests it can actually fix It implements the thing It runs the ~22,000-test scoreboard (about 7 minutes of fan noise) If the number went up: commit, push, repeat If the number went down: I get to say my other line, “hmm, that regressed, look again” That’s it. That’s the job. I’ve achieved peak delegation and I’m not even sorry. The Oracle Cannot Be Bribed. The Harness, However, Lied to My Face. Early on, the pass rate plateaued in a way that felt wrong. Whole categories of tests — obviously simple ones — kept failing with diffs that looked identical to the expected output. I stared at those diffs like a man staring at two identical photos in a spot-the-difference puzzle, finding nothing. The difference was invisible because it was literally invisible: carriage returns. The test corpus had been checked out on Windows with CRLF line endings, and our scoreboard compared output byte-for-byte. PHP’s own test runner normalizes line endings before comparing. Ours didn’t. Which means the harness was silently failing essentially every multi-line test in the corpus on line endings alone, and had been for weeks. One line of normalization code. Hundreds of tests flipped to green instantly. The lesson tattooed itself onto the project: measure your measurement. Your oracle is only as honest as the plumbing that connects you to it. We now normalize exactly the way run-tests.php does, and every suspicious plateau since has triggered the same question first — is the engine wrong, or is the scoreboard lying? PHP’s Test Suite Is a Minefield With a Readme Here’s something nobody tells you about running somebody else’s 22,000-file test corpus: some of those files are bombs. Not malicious ones — accidental ones. Regression tests for ancient memory bugs that allocate absurd structures, generator tests that expand into infinity, tests that were only ever meant to run inside PHP’s own carefully-fenced CI. I found this out the way all great discoveries are made: my development machine hard-restarted . Not “the program crashed.” Not “the terminal froze.” The entire computer went black and rebooted, because a generator test convinced our...

My AI-built PHP engine in Rust passes 17% of PHP-src tests, renders WordPress

Related Articles

(no title)

Scientists reverse brain aging, with a nasal spray

AI has torched the market for junior programmers

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org