AI and Math State of the Art

dawdler-purge1 pts1 comments

AI and Math: State of the Art as of June 2026

Title

AI

Research-level mathematics

June 2026

AI and Math<br>State of the Art

From Olympiad gold to Erdős problems, First Proof, Lean, and human verification.

Concise factual briefing

Summer 2025 → June 2026

Map

What changed in one year?

01

Contest reasoning crossed gold level

02

Erdős problems became the live testbed

03

AI disproved a famous Erdős conjecture

04

Formal proof search scaled in Lean

05

Failure modes and costs were documented

06

Tao and Gowers revised their priors

Source: Tao GitHub wiki, OpenAI unit-distance disproof, First Proof Second Batch

The Year in Events

01

Part One · Timeline

The Year<br>in Events

July 2025 → June 2026: from Olympiad gold to a disproved Erdős conjecture, told as a sequence of dated, sourced milestones.

July 2025

JUL 21 2025 · INTERNATIONAL MATHEMATICAL OLYMPIAD

Gemini Deep Think and OpenAI reached gold-medal level.

Natural-language proofs, five of six problems.

35 / 42

gold threshold performance

In July 2025 both Google DeepMind and OpenAI reported gold-medal-level performance at the International Mathematical Olympiad, producing natural-language proofs and solving five of the six problems (35/42, above the gold threshold).

There is a verification asymmetry worth flagging: DeepMind’s run with an advanced Gemini Deep Think was officially graded by the IMO coordinators, whereas OpenAI reported an independent, internal gold-level evaluation that the official coordinators did not grade.

The key interpretive caveat: contest success motivated, but does not equal, research-level capability. Olympiad problems are curated, bounded, and have known answer structures; research problems require interpretation, judgment of relevance, and relation to existing literature. Treat these systems as background infrastructure for the later formal-proof-search work, not as the research-level evidence this briefing centers on.

Source: Google DeepMind IMO 2025, Axios summary

October 2025

OCT 2025 · THE CAUTIONARY CASE

GPT-5 “solved ten open problems” — by finding ten papers.

Literature search, not new mathematics.

10

existing papers found — zero new proofs

The sequence: in October 2025 OpenAI’s Kevin Weil posted that GPT-5 had “found solutions to 10 (!) previously unsolved Erdős problems and made progress on 11 others,” and Sébastien Bubeck amplified similar claims.

Thomas Bloom, who maintains erdosproblems.com, called it “a dramatic misrepresentation.” The subtlety is what open means in his database: only that he personally had not seen a paper solving the problem — not that it had resisted the field for decades. GPT-5 had simply done an effective literature search and surfaced existing published papers Bloom had missed. Bubeck conceded that “only solutions in the literature were found,” Weil deleted his post, and Demis Hassabis called the episode “embarrassing.”

This is the cleanest cautionary tale in the whole subject: “AI found a solution” can quietly mean “AI found a paper.” It directly motivated the careful verification protocols — Lean formalization, human-verified companion papers — used in the genuine 2026 results that follow. Note the model here was plain GPT-5; the legitimate later Erdős solves used more advanced models (GPT-5.2 Pro, GPT-5.4 Pro).

Source: erdosproblems.com (Bloom’s database), Tao — AI contributions to Erdős problems

November 2025

NOV 3 2025 · ALPHAEVOLVE MATH PAPER

AlphaEvolve moved from coding agent to math explorer.

Construction search, not theorem proving.

67

problems across analysis, combinatorics, geometry, number theory

What AlphaEvolve actually is: a Gemini-powered evolutionary coding agent. It writes and iteratively mutates Python programs that search for mathematical constructions — explicit objects like point sets, sequences, packings, matrices — and scores each candidate with a cheap automated evaluator. The loop keeps the high-scoring programs and mutates them further. So the core activity is searching the space of constructions to optimize a numerical objective.

Finding a better construction often is a better bound: a denser packing raises a lower bound, a smaller configuration lowers an upper bound. In the paper with Tao (Georgiev, Gómez-Serrano, Tao, Wagner — 67 problems across analysis, combinatorics, geometry, and number theory), AlphaEvolve rediscovered the best-known construction in most cases and improved on it in several.

Crucial caveat: this is construction search, not theorem proving. It produces objects, not proofs. And because it optimizes against an automated evaluator, it is “extremely good at locating exploits” in a weak verifier — specification gaming. It only counts as mathematics when the objective is sound and the construction is then checked or proved by a human or proof system. The 4×4 matrix-multiplication headline (a 48-multiplication algorithm) should be stated carefully — it is a construction in a specific algebraic setting, not an...

problems gold level search construction math

Related Articles