My AI-built PHP engine in Rust passes 17% of PHP-src tests, renders WordPress

ekinertac1 pts0 comments

I Don't Know Rust. My AI Is Rewriting PHP in It Anyway. - ekinertac&larr; essaysI Don&rsquo;t Know Rust. My AI Is Rewriting PHP in It Anyway.<br>A few nights ago I watched my terminal print out a 26 KB WordPress front page — Phargo Test Site, the block-library CSS, &ldquo;Hello world!&rdquo; pulled from a SQLite database, and a clean at the bottom. Completely unremarkable output, except for one detail: the PHP engine that served it contains zero lines of PHP&rsquo;s actual source code . It&rsquo;s a from-scratch interpreter written in Rust.<br>Here&rsquo;s the part I need you to sit with: I don&rsquo;t know Rust. I have never written a lexer. I could not explain to you what a &ldquo;tree-walking evaluator&rdquo; is without reading the Wikipedia article in another tab. If you cornered me at a party and asked how PHP&rsquo;s garbage collector works, I would fake a phone call.<br>The engine is called Phargo, and my contribution to it is, roughly, aiming. An AI writes the code. I point it at a target, read what comes back like a medieval king reviewing naval charts — solemn nod, zero comprehension — and type the most powerful phrase in modern software development:<br>&ldquo;looks good, continue.&rdquo;<br>The Experiment: Radical Honesty as a Build System<br>Everyone and their houseplant has an AI-built project right now, and every single one comes with the same unfalsifiable claim: &ldquo;it works!&rdquo; Works according to whom? The AI that wrote it? The demo that was recorded on the fourth take?<br>So the whole experiment rests on one idea, borrowed from watching the Bun team drive their JavaScript runtime against real-world test suites: don&rsquo;t let the AI grade its own homework.<br>PHP ships with its own test suite — about 22,000 .phpt files written by the PHP internals team over three decades. Tests I didn&rsquo;t write. Tests the AI didn&rsquo;t write. Tests that encode every cursed corner of the language, from DateTime daylight-saving math to what exactly var_dump() prints for a float. That suite is the oracle. The scoreboard runs all of it, and the pass rate is auto-generated into the repo after every run.<br>The number cannot be flattered, negotiated with, or prompted into a better mood. Either bug40261.phpt passes or it doesn&rsquo;t.<br>Current score: 3,844 of 22,037 — 17.4% of the entire upstream PHP test suite. And before you snort-laugh at 17%: the realistic ceiling is around 40–45%, because the rest of the suite tests C extensions (GD, curl, SOAP, intl, MySQL drivers…) that are explicitly out of scope. Within the actual playing field, the climb is very real — it started at zero.<br>My loop as the human is almost embarrassingly thin:<br>The AI runs a failure histogram over the whole corpus to find the biggest cluster of failing tests it can actually fix<br>It implements the thing<br>It runs the ~22,000-test scoreboard (about 7 minutes of fan noise)<br>If the number went up: commit, push, repeat<br>If the number went down: I get to say my other line, &ldquo;hmm, that regressed, look again&rdquo;<br>That&rsquo;s it. That&rsquo;s the job. I&rsquo;ve achieved peak delegation and I&rsquo;m not even sorry.<br>The Oracle Cannot Be Bribed. The Harness, However, Lied to My Face.<br>Early on, the pass rate plateaued in a way that felt wrong. Whole categories of tests — obviously simple ones — kept failing with diffs that looked identical to the expected output. I stared at those diffs like a man staring at two identical photos in a spot-the-difference puzzle, finding nothing.<br>The difference was invisible because it was literally invisible: carriage returns. The test corpus had been checked out on Windows with CRLF line endings, and our scoreboard compared output byte-for-byte. PHP&rsquo;s own test runner normalizes line endings before comparing. Ours didn&rsquo;t. Which means the harness was silently failing essentially every multi-line test in the corpus on line endings alone, and had been for weeks.<br>One line of normalization code. Hundreds of tests flipped to green instantly.<br>The lesson tattooed itself onto the project: measure your measurement. Your oracle is only as honest as the plumbing that connects you to it. We now normalize exactly the way run-tests.php does, and every suspicious plateau since has triggered the same question first — is the engine wrong, or is the scoreboard lying?<br>PHP&rsquo;s Test Suite Is a Minefield With a Readme<br>Here&rsquo;s something nobody tells you about running somebody else&rsquo;s 22,000-file test corpus: some of those files are bombs. Not malicious ones — accidental ones. Regression tests for ancient memory bugs that allocate absurd structures, generator tests that expand into infinity, tests that were only ever meant to run inside PHP&rsquo;s own carefully-fenced CI.<br>I found this out the way all great discoveries are made: my development machine hard-restarted . Not &ldquo;the program crashed.&rdquo; Not &ldquo;the terminal froze.&rdquo; The entire computer went black and rebooted, because a generator test convinced our...

rsquo tests test ldquo rdquo line

Related Articles