NUTMEG — Can a model beat the closing line?
Live model · any nation vs any nation
Pick two countries. See what the model thinks.
The same Elo engine, running in your browser over 284 national teams. Neutral venue is on by default (like a World Cup).
Home / Team A
vs
Away / Team B
Neutral venue (no home advantage)<br>⇄ swap
2026 World Cup · simulated 20,000 times
Who wins the 2026 World Cup?
I play out the rest of the tournament twenty thousand times and count the trophies. Even the favourite comes out only about a one-in-five shot, which is how single tournaments go.
Win it<br>Reach final<br>Reach semis
The experiment
The only test I trust
A model can call winners all day and still lose money, because the price already reflects who's likely to win, and then takes a cut on top.
So the test isn't whether it picks winners. It's whether it can beat the closing line, the last price before kickoff. By then the price has absorbed every public model and all the sharp money, which makes it about as accurate as a price gets. Beat that, or there's no edge to find.
I wrote the pass/fail rule down before running anything: an edge counts only if, out of sample, CLV (closing line value — whether the price drifted toward my bets) comes out above zero, and ROI after the cut is positive. CLV matters more because it doesn't ride on who won any single game.
What we actually did
Built the simplest model — Elo<br>One rating per team, learned only from past results, dates, and venue. No players, no injuries. A Davidson draw model turns rating gaps into home/draw/away odds. It only sees games in date order, so it can't peek at the future.
The club benchmark<br>Walk-forward over 24,359 matches in the five big European leagues, betting value picks at Pinnacle's pre-close price and grading against its close.
Pivoted to the World Cup<br>Rebuilt for national teams with neutral-venue handling. Tested the hardest case — the knockout stage (binary, no draws) — training on everything else.
Checked it against real money<br>Pulled live 2026 World Cup prices from Kalshi & Polymarket and measured CLV from Kalshi's own minute-by-minute price history.
Adversarially verified every "edge"<br>A multi-agent research pass took the 12 biggest value bets and asked, with real team news: is the market wrong, or does it know something our score-only model doesn't?
The data
Clubs: football-data.co.uk — results + Pinnacle closing odds, 5 leagues, ~14 seasons. Internationals: martj42/international_results — every national-team game since 1872 (~49k), plus 2026 fixtures and penalty shootouts. Live markets: Kalshi per-match moneylines + price history, and Polymarket.
The results
No edge.
Clubs · 4,099 bets vs Pinnacle close
No edge
World Cup · CLV vs Kalshi
Indistinguishable from zero
Is the market actually right? Calibration over 73,077 outcomes
Expected value of "just bet" strategies (at the close)
Even the market's one flaw — heavy favourites are slightly underpriced — nets to break-even, almost exactly eaten by the 2.4% margin.
The verification · we found nothing
12 biggest "value bets." 0 survived.
I researched each one's team news and group situation. Every apparent edge came down to my model being wrong: a rating quirk, or something the market knew that the model couldn't.
Match<br>Model's bet<br>Edge<br>What the market knew that we didn't
The verdict
The price is already the best guess available.
Bet against the market and you're usually the one who's wrong. Bet with it and you just hand over the fee. Either way the expected value comes out negative.
If there's an edge to find, it isn't at the World Cup, one of the most heavily traded events in sport. It would be in smaller, ignored markets, and only with information the price doesn't already contain. That's the obvious next thing to try, and it would face the same test: positive CLV on data the model never saw.