The Joule Index - The Auditable AI Cost and Energy Benchmark
Benchmark report · Reported May 2026<br>The only AI benchmark where every score is auditable.<br>Real May 2026 open-source bug fixes. Real published model prices. Real maintainer-merge-ready diffs. Four axes (dollars, joules, attention, accessibility) on a single chart, with full observational traces published under the same disclosure rules MLCommons established for MLPerf Power.<br>9 of 9 runs · attention F1<br>1.000<br>3 real OSS tasks across 3 Dropstone tiers
Heavy vs Fast joule premium<br>7.5×<br>for identical engineering output
Mean $ / task · Fast<br>$0.082<br>Dropstone Fast via Dropstone CLI
The finding · Reported May 2026<br>Across three model tiers and three real bugs, nine runs produced the same merged diff.<br>Blankline's research team ran Dropstone CLI on three real May 2026 open-source bug fixes: tiny RSS routes in DIYgod's RSSHub (a 44,000-star aggregator) and an 8-file refactor to Mozilla's Common Voice bundler. Every run produced a Pull Request that matched the diff a real maintainer had merged into production.<br>The cheapest tier did it for $0.082 per task and 224 joules . The flagship tier did the same work for $0.857 per task and 1,693 joules . Same diff. Same merge-readiness.<br>On this evidence, the premium is paying for compute, not capability.
See the leaderboard →Read the methodology
Why this benchmark, why now<br>Forty-seven AI benchmarks are in active use in May 2026. None of them tell you what their numbers cost. Not in dollars, not in joules, not in the human hour they were supposed to replace.<br>The Joule Index is the first to publish all of it on one chart, and the first to require Verified disclosure for every entry on the leaderboard, modeled on the MLPerf Power regime that made MLPerf the gold standard of inference benchmarking.<br>If a benchmark score cannot be verified, it is not a benchmark. It is marketing.
Figure 1 · The headline image<br>All nine runs shipped working code. The only thing that changed was the bill.<br>Each marker is one (task × tier) run. Lower-left is better. Brighter cyan = cheaper tier.<br>Download PNG<br>$0.01$0.10$1100 J1k JClaude Haiku 4.5Claude Opus 4.7Claude Sonnet 4.6Dropstone FastCURRENT LEADERDropstone HeavyDropstone ProGemini 3.1 FlashGemini 3.1 ProDOLLARS PER MERGE-READY PR (LOG)JOULES PER TASK (LOG)Source · The Joule Index · Blanklinejoule.blankline.org · Reported May 2026
Verified<br>Auditable by default<br>Every score on the leaderboard carries a sanitized observational trace. Tokens, costs, joules, file diffs. Anyone can re-score.
Live OSS<br>Real bugs, last week<br>Tasks are real GitHub issues filed and merged within the last 30 days. Contamination mathematically prevented.
Four axes<br>One chart, four readers<br>Dollars for the CFO. Joules for the climate scientist. Attention for the ML researcher. Accessibility for the median human.
The civilizational question<br>The cost of intelligence is the price of admission to civilization's next era.<br>The race for AGI is real. The questions everyone asks are when? and who?. The questions almost nobody asks are at what cost? and to whom is it accessible?<br>Every joule of intelligence the species generates has to come from a power grid. Every dollar a procurement team spends on AI is a dollar that does not go to housing, healthcare, or science. Every architectural choice frontier labs make, whether to cache prompts, how to price tiers, or how to disclose energy, shapes the affordability of the next decade of cognitive labor.<br>The Joule Index is the only benchmark that holds those choices accountable in the open.
Read the long argument →