CursorBench 3.1 (With Fable 5)

Cursor · CursorBench

Product → Enterprise

Pricing

Resources →

Sign inContactContact salesDownload

A scatter and line chart comparing Fable 5, Opus 4.8, Opus 4.7, GPT-5.5, Sonnet 4.6, Composer 2.5, and Composer 2 scores against average cost per task.75% CursorBench 3.1 score70%65%60%55%50%45%40%35%30%$20$16$12$8$4$0Average cost per taskFable 5 high(default)Composer 2.5GPT-5.5 medium(default)Gemini 3.5 FlashOpus 4.8 high(default)Sonnet 4.6 high(default)Kimi 2.6CostTokensSteps ModelScoreCostCost / taskTokensTokens / taskStepsSteps / task1Fable 5 Max72.9%$18.0263,842762Fable 5 Extra High72.0%$13.7448,754633Fable 5 High70.6%$10.8137,173544Fable 5 Medium69.8%$8.2728,507475Opus 4.7 Max64.8%$11.0262,989966GPT-5.5 Extra High64.3%$4.3717,905467Fable 5 Low64.2%$5.7018,882368Opus 4.8 Max63.8%$7.5977,370609Composer 2.563.2%$0.5515,1523710GPT-5.5 High62.6%$3.5913,3294011Opus 4.8 Extra High62.1%$6.1455,6225412Opus 4.7 Extra High61.6%$7.1143,9427213Opus 4.7 High59.4%$5.0132,2275914GPT-5.5 Medium59.2%$2.229,0653515Opus 4.8 High58.4%$4.4136,7884516Opus 4.8 Medium56.6%$3.8331,6844117Opus 4.8 Low54.3%$2.9322,7263618Opus 4.7 Medium52.7%$2.9319,1934119Composer 252.2%$0.5614,1634020Gemini 3.5 Flash49.8%$1.9435,1057921Sonnet 4.6 Max49.0%$3.0940,2805522GPT-5.5 Low48.8%$1.194,9232423Sonnet 4.6 High48.8%$3.0637,3525724Opus 4.7 Low48.3%$1.8713,1642925Kimi 2.647.6%$1.2724,7835626Sonnet 4.6 Medium46.0%$2.6431,3605027Sonnet 4.6 Low41.5%$1.8921,2115028Kimi 2.531.9%$0.879,44630

Changelog CursorBench 3.1 Introduced problems focused on codebase understanding, bugfinding, planning, and code review. Improved grading criteria for some edit tasks.

CursorBench 3.0 Initial set of tasks focused on edit, refactor, and bugfix problems.

Avg cost / task is computed by applying each model's published per-million-token pricing (input, cache read, cache write, and output) to the tokens it used on each CursorBench 3.1 task, then averaging across tasks. Results are subject to variance; small differences in scores may not be statistically meaningful.

Product → Enterprise

Pricing

Resources →

Sign inContactContact salesDownload

Changelog CursorBench 3.1 Introduced problems focused on codebase understanding, bugfinding, planning, and code review. Improved grading criteria for some edit tasks.

CursorBench 3.0 Initial set of tasks focused on edit, refactor, and bugfix problems.

CursorBench 3.1 (With Fable 5)

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs