Gemini 3.5 Flash beats Opus 4.8 on bluffbench

ionychal1 pts0 comments

@simonpcouch.com on Bluesky

JavaScript Required

This is a heavily interactive web application, and JavaScript is required. Simple HTML interfaces are possible, but that is not what this is.<br>Learn more about Bluesky at bsky.social and atproto.com.

Post

Simon P. Couch

simonpcouch.com

did:plc:bspwzx2ytje3gbvikujf2gl5

Re-ran this eval against Opus 4.8, Gemini 3.5 Flash, and GPT 5.5. Opus 4.8 is a modest improvement over the previously tested Opus models, but Gemini 3.5 Flash is the real stand-out!

simonpcouch.github.io/bluffbench/

[contains quote post or other embedded content]

2026-05-28T19:41:06.976Z

opus gemini flash simonpcouch bluffbench bluesky

Related Articles