CursorBench 3.1 (With Fable 5)

davidgomes1 pts0 comments

Cursor · CursorBench

Product →<br>Enterprise

Pricing

Resources →

Sign inContactContact salesDownload

A scatter and line chart comparing Fable 5, Opus 4.8, Opus 4.7, GPT-5.5, Sonnet 4.6, Composer 2.5, and Composer 2 scores against average cost per task.75% CursorBench 3.1 score70%65%60%55%50%45%40%35%30%$20$16$12$8$4$0Average cost per taskFable 5 high(default)Composer 2.5GPT-5.5 medium(default)Gemini 3.5 FlashOpus 4.8 high(default)Sonnet 4.6 high(default)Kimi 2.6CostTokensSteps<br>ModelScoreCostCost / taskTokensTokens / taskStepsSteps / task1Fable 5 Max72.9%$18.0263,842762Fable 5 Extra High72.0%$13.7448,754633Fable 5 High70.6%$10.8137,173544Fable 5 Medium69.8%$8.2728,507475Opus 4.7 Max64.8%$11.0262,989966GPT-5.5 Extra High64.3%$4.3717,905467Fable 5 Low64.2%$5.7018,882368Opus 4.8 Max63.8%$7.5977,370609Composer 2.563.2%$0.5515,1523710GPT-5.5 High62.6%$3.5913,3294011Opus 4.8 Extra High62.1%$6.1455,6225412Opus 4.7 Extra High61.6%$7.1143,9427213Opus 4.7 High59.4%$5.0132,2275914GPT-5.5 Medium59.2%$2.229,0653515Opus 4.8 High58.4%$4.4136,7884516Opus 4.8 Medium56.6%$3.8331,6844117Opus 4.8 Low54.3%$2.9322,7263618Opus 4.7 Medium52.7%$2.9319,1934119Composer 252.2%$0.5614,1634020Gemini 3.5 Flash49.8%$1.9435,1057921Sonnet 4.6 Max49.0%$3.0940,2805522GPT-5.5 Low48.8%$1.194,9232423Sonnet 4.6 High48.8%$3.0637,3525724Opus 4.7 Low48.3%$1.8713,1642925Kimi 2.647.6%$1.2724,7835626Sonnet 4.6 Medium46.0%$2.6431,3605027Sonnet 4.6 Low41.5%$1.8921,2115028Kimi 2.531.9%$0.879,44630

Changelog<br>CursorBench 3.1<br>Introduced problems focused on codebase understanding, bugfinding, planning, and code review.<br>Improved grading criteria for some edit tasks.

CursorBench 3.0<br>Initial set of tasks focused on edit, refactor, and bugfix problems.

Avg cost / task is computed by applying each model's published per-million-token pricing (input, cache read, cache write, and output) to the tokens it used on each CursorBench 3.1 task, then averaging across tasks. Results are subject to variance; small differences in scores may not be statistically meaningful.

© 2026 Anysphere, Inc.🛡 SOC 2 Certified<br>🌐English↓English✓<br>简体中文<br>日本語<br>繁體中文<br>Español<br>Français<br>Português<br>한국어<br>Deutsch<br>हिन्दी

Product →<br>Enterprise

Pricing

Resources →

Sign inContactContact salesDownload

A scatter and line chart comparing Fable 5, Opus 4.8, Opus 4.7, GPT-5.5, Sonnet 4.6, Composer 2.5, and Composer 2 scores against average cost per task.75% CursorBench 3.1 score70%65%60%55%50%45%40%35%30%$20$16$12$8$4$0Average cost per taskFable 5 high(default)Composer 2.5GPT-5.5 medium(default)Gemini 3.5 FlashOpus 4.8 high(default)Sonnet 4.6 high(default)Kimi 2.6CostTokensSteps<br>ModelScoreCostCost / taskTokensTokens / taskStepsSteps / task1Fable 5 Max72.9%$18.0263,842762Fable 5 Extra High72.0%$13.7448,754633Fable 5 High70.6%$10.8137,173544Fable 5 Medium69.8%$8.2728,507475Opus 4.7 Max64.8%$11.0262,989966GPT-5.5 Extra High64.3%$4.3717,905467Fable 5 Low64.2%$5.7018,882368Opus 4.8 Max63.8%$7.5977,370609Composer 2.563.2%$0.5515,1523710GPT-5.5 High62.6%$3.5913,3294011Opus 4.8 Extra High62.1%$6.1455,6225412Opus 4.7 Extra High61.6%$7.1143,9427213Opus 4.7 High59.4%$5.0132,2275914GPT-5.5 Medium59.2%$2.229,0653515Opus 4.8 High58.4%$4.4136,7884516Opus 4.8 Medium56.6%$3.8331,6844117Opus 4.8 Low54.3%$2.9322,7263618Opus 4.7 Medium52.7%$2.9319,1934119Composer 252.2%$0.5614,1634020Gemini 3.5 Flash49.8%$1.9435,1057921Sonnet 4.6 Max49.0%$3.0940,2805522GPT-5.5 Low48.8%$1.194,9232423Sonnet 4.6 High48.8%$3.0637,3525724Opus 4.7 Low48.3%$1.8713,1642925Kimi 2.647.6%$1.2724,7835626Sonnet 4.6 Medium46.0%$2.6431,3605027Sonnet 4.6 Low41.5%$1.8921,2115028Kimi 2.531.9%$0.879,44630

Changelog<br>CursorBench 3.1<br>Introduced problems focused on codebase understanding, bugfinding, planning, and code review.<br>Improved grading criteria for some edit tasks.

CursorBench 3.0<br>Initial set of tasks focused on edit, refactor, and bugfix problems.

Avg cost / task is computed by applying each model's published per-million-token pricing (input, cache read, cache write, and output) to the tokens it used on each CursorBench 3.1 task, then averaging across tasks. Results are subject to variance; small differences in scores may not be statistically meaningful.

© 2026 Anysphere, Inc.🛡 SOC 2 Certified<br>🌐English↓English✓<br>简体中文<br>日本語<br>繁體中文<br>Español<br>Français<br>Português<br>한국어<br>Deutsch<br>हिन्दी

cursorbench default extra composer cost task

Related Articles