I Don't Know Rust. My AI Is Rewriting PHP in It Anyway. - ekinertac← essaysI Don’t Know Rust. My AI Is Rewriting PHP in It Anyway.<br>A few nights ago I watched my terminal print out a 26 KB WordPress front page — Phargo Test Site, the block-library CSS, “Hello world!” pulled from a SQLite database, and a clean at the bottom. Completely unremarkable output, except for one detail: the PHP engine that served it contains zero lines of PHP’s actual source code . It’s a from-scratch interpreter written in Rust.<br>Here’s the part I need you to sit with: I don’t know Rust. I have never written a lexer. I could not explain to you what a “tree-walking evaluator” is without reading the Wikipedia article in another tab. If you cornered me at a party and asked how PHP’s garbage collector works, I would fake a phone call.<br>The engine is called Phargo, and my contribution to it is, roughly, aiming. An AI writes the code. I point it at a target, read what comes back like a medieval king reviewing naval charts — solemn nod, zero comprehension — and type the most powerful phrase in modern software development:<br>“looks good, continue.”<br>The Experiment: Radical Honesty as a Build System<br>Everyone and their houseplant has an AI-built project right now, and every single one comes with the same unfalsifiable claim: “it works!” Works according to whom? The AI that wrote it? The demo that was recorded on the fourth take?<br>So the whole experiment rests on one idea, borrowed from watching the Bun team drive their JavaScript runtime against real-world test suites: don’t let the AI grade its own homework.<br>PHP ships with its own test suite — about 22,000 .phpt files written by the PHP internals team over three decades. Tests I didn’t write. Tests the AI didn’t write. Tests that encode every cursed corner of the language, from DateTime daylight-saving math to what exactly var_dump() prints for a float. That suite is the oracle. The scoreboard runs all of it, and the pass rate is auto-generated into the repo after every run.<br>The number cannot be flattered, negotiated with, or prompted into a better mood. Either bug40261.phpt passes or it doesn’t.<br>Current score: 3,844 of 22,037 — 17.4% of the entire upstream PHP test suite. And before you snort-laugh at 17%: the realistic ceiling is around 40–45%, because the rest of the suite tests C extensions (GD, curl, SOAP, intl, MySQL drivers…) that are explicitly out of scope. Within the actual playing field, the climb is very real — it started at zero.<br>My loop as the human is almost embarrassingly thin:<br>The AI runs a failure histogram over the whole corpus to find the biggest cluster of failing tests it can actually fix<br>It implements the thing<br>It runs the ~22,000-test scoreboard (about 7 minutes of fan noise)<br>If the number went up: commit, push, repeat<br>If the number went down: I get to say my other line, “hmm, that regressed, look again”<br>That’s it. That’s the job. I’ve achieved peak delegation and I’m not even sorry.<br>The Oracle Cannot Be Bribed. The Harness, However, Lied to My Face.<br>Early on, the pass rate plateaued in a way that felt wrong. Whole categories of tests — obviously simple ones — kept failing with diffs that looked identical to the expected output. I stared at those diffs like a man staring at two identical photos in a spot-the-difference puzzle, finding nothing.<br>The difference was invisible because it was literally invisible: carriage returns. The test corpus had been checked out on Windows with CRLF line endings, and our scoreboard compared output byte-for-byte. PHP’s own test runner normalizes line endings before comparing. Ours didn’t. Which means the harness was silently failing essentially every multi-line test in the corpus on line endings alone, and had been for weeks.<br>One line of normalization code. Hundreds of tests flipped to green instantly.<br>The lesson tattooed itself onto the project: measure your measurement. Your oracle is only as honest as the plumbing that connects you to it. We now normalize exactly the way run-tests.php does, and every suspicious plateau since has triggered the same question first — is the engine wrong, or is the scoreboard lying?<br>PHP’s Test Suite Is a Minefield With a Readme<br>Here’s something nobody tells you about running somebody else’s 22,000-file test corpus: some of those files are bombs. Not malicious ones — accidental ones. Regression tests for ancient memory bugs that allocate absurd structures, generator tests that expand into infinity, tests that were only ever meant to run inside PHP’s own carefully-fenced CI.<br>I found this out the way all great discoveries are made: my development machine hard-restarted . Not “the program crashed.” Not “the terminal froze.” The entire computer went black and rebooted, because a generator test convinced our...