Riemann-Bench

Topfi1 pts0 comments

Riemann-bench | Surge AI

Blog<br>Leaderboards<br>Workforce<br>Products<br>Research<br>Careers<br>Contact

Login

Menu

Close

Mathematics at the frontier

Riemann-bench

We evaluate AI models on advanced mathematical problems requiring deep reasoning and novel synthesis. Our benchmark features problems from cutting-edge mathematics, sourced from leading mathematicians – Ivy League professors, PhD IMO medalists, graduate students at the top of their field – in the course of their research.

READ MORE ABOUT Riemann-BENCH ON OUR BLOG<br>Research paper

RL Environments and the Hierarchy of Agentic Capabilities

Our RL environment run on 9 models revealed the core capabilities all agents need to master: tool use, planning, adaptability, groundedness, and common sense.

Model Rankings<br>Last updated 05/27/2026

Claude Fable 5 / Mythos 5

55

GPT-5.5 (xHigh reasoning)

41.6

GPT-5.2 (xHigh reasoning)

32

Claude Opus 4.8

25.6

Claude Opus 4.6

22.4

Claude Opus 4.7

20.8

Gemini 3.5 Flash (High Reasoning)

15.2

Gemini 3.1 (Pro)

15.2

Kimi K2.6

10.4

Claude Opus 4.5

10.4

Kimi K2.5

DeepSeek V4 (Flash)

5.6

Qwen 3.7 (Max)

4.8

DeepSeek v3.2 (Thinking)

4.8

DeepSeek V4 (Pro)

2.4

Extreme Difficulty,<br>Rigorous Verification

Robust Maximal Independent Sets

Problem

A robust maximal independent set in a graph $G$ is a maximal independent set that remains maximal in all connected spanning subgraphs of $G$. How many connected graphs on $12$ vertices have the property that every maximal independent set is a robust maximal independent set, up to isomorphism?

Hahn Series and Multibasic Modules

Problem

Notation and definitions for background context:<br>Let $F$ be the field of order 2. Let $K$ be the field of Hahn series in indeterminate $t$ with value group $\mathbb{R}$ and residue field $F$. Let $A$ be the subring of $K$ consisting of those $a \in K$ with non-negative valuation. Consider $K$ as an $A$-module. For $q \in \mathbb{R}$, let $I_q = t^q A$ and $I_{>q} = \bigcup_{r>q} I_r$. Write $A/I_{>0}$ as $F$, since they are identical both as $A$-modules and as fields. Let $\Theta = K/I_{>0}$ and $\Phi = K/A$. We say that an $A$-module $M$ is 'basic' if it is isomorphic to $L/N$ for some $N 0}\}$.<br>You may assume the following facts:<br>Fact 1: The decomposition of a multibasic $A$-module into basic submodules is unique up to the order of the summands.<br>Fact 2: If $M_i = L_i / N_i$ and $N_i Find the number of distinct isomorphism classes of multibasic $A$-modules $M$ satisfying the following conditions:<br>(i) $K \otimes \text{End}(M) = K$.<br>(ii) $F \otimes \text{End}(M) = F$.<br>(iii) Let $e_r = \dim_F(F \otimes I_r \text{Hom}(I_{>0}, M))$ for all real $r \ge 0$. Then $\lim_{p \to q^-} e_p = e_q$ for all real $q > 0$ except for integers $q$ with $29 \le q \le 328$.<br>If your answer is infinite, write -1.

Eynard-Orantin Topological Recursion

Problem

Consider the Eynard Orantin Topological Recursion Formalism for the spectral curve $(\mathbb{C}\mathbb{P}^1, x, y, \omega_{0,2}(x, y))$, where $x = t + 1/t$ and $y = t^3 / 3$, and the fundamental bidifferential is given by $\omega_{0,2}(x_1, x_2) = \frac{dz_1 dz_2}{(z_1 - z_2)^2}$, with $z_1, z_2 \in \mathbb{C}\mathbb{P}^1$. Note that $x$ has two simple ramification points at $\pm 1$ of order $2$ with deck transformation $\theta(t) = 1/t$.<br>Please calculate the Free energies $F_2$ and return it as a rational fraction in the format $a/b$ for $a$ and $b$ coprime. Recall that the free energies $F_g$ can be computed as the following integral $F_g = \frac{1}{2g-2} \sum_{a \in \Delta} \text{Res}_{q=a} \Phi(q)\omega_{g,1}(q)$, where $\Phi(q) = \int_{o}^{q} y(t)dx(t)$, for an arbitrary base point $o$.

Measuring progress along the<br>mathematical frontier.

Read more about Riemann-Bench: Our methodology

EXPLORE ALL BENCHMARKS

Our Leaderboards

View all

Creative, Business, and Everyday writing

Hemingway-bench

Stop rewarding slop. We take real-world writing tasks and put them in front of master wordsmiths. Our goal: to push AI writing from two-second vibes to genuine nuance and impact.

Read Blog Post

Rank

Model

elo score (95% ci)

Gemini 3.1 (Pro)

1087

1068<br>1105

Google

Gemini 3 (Flash)

1079

1062<br>1095

Google

Gemini 3 (Pro)

1074

1051<br>1097

Google

Claude Opus 4.7 (Max)

1057

1036<br>1078

Anthropic

GPT-5.5

1054

1032<br>1076

OpenAI

Claude Opus 4.6

1054

1035<br>1073

Anthropic

DeepSeek V4 (Pro)

1039

1017<br>1060

High-Flyer

Claude Opus 4.5

1038

1019<br>1057

Anthropic

DeepSeek V4 (Flash)

1021

999<br>1042

High-Flyer

GPT-5.2 (Chat)

1018

1001<br>1035

OpenAI

Kimi K2.5

1018

1000<br>1035

Moonshot AI

Claude Sonnet 4.6

1014

995<br>1032

Anthropic

View full leaderboard

Enterprise Agents in Realistic RL Environments

EnterpriseBench: CoreCraft

Stop testing models in tiny, self-contained environments. We built CoreCraft, a large-scale startup world, and deployed AI agents to solve real tasks. Our goal: to move agents beyond the cleanliness of the lab and into the chaos of enterprise reality.

Read Blog...

claude opus bench maximal mathbb riemann

Related Articles