MiniMax M3 Review: Matching GPT-5.5 and Opus?

MiniMax M3 Review: Finally Matching GPT-5.5 & Opus? | Thomas Wiegold Blogsvg]:px-3 bg-linear-to-r from-accent to-accent-secondary hover:from-accent-secondary hover:to-accent text-black font-bold rounded-full px-6 transition-all duration-300 hover:scale-105 hover:shadow-[0_0_25px_rgba(255,107,53,0.4)]" href="/#contact" data-discover="true">Get in Touch

← Back to BlogI don't really enjoy writing model reviews. There, I said it. After the tenth "this new model is faster and smarter than the last one" post, you start to feel like you're describing the same car with a fresh coat of paint. So when I tell you this MiniMax M3 review is one I actually wanted to write, take it as a signal. M3 is interesting. Not "interesting for a Chinese open-weights model." Just interesting, full stop.

I've been in the MiniMax corner for a while now. I liked M2.5 when it landed, and I liked M2.7 even more. But there was always the same asterisk in the back of my mind: genuinely good, just not GPT-or-Opus good. A gap you could feel. This time the gap might have closed. So I ran my usual battery of tests, watched the thing think for an uncomfortably long time, and came away mostly impressed. Here's the whole story.

What Is MiniMax M3?

MiniMax M3 is an open-weights, natively multimodal model (text, image, and video in, text out) that launched on June 1, 2026 with a 1 million token context window. It's the course-correction in the M-series: where the M2 generation deliberately ditched sparse attention over production worries, M3 brings it back as the headline feature.

That feature is called MiniMax Sparse Attention, or MSA. The short version for anyone who doesn't want the linear-algebra lecture: a lightweight index branch scans incoming tokens, picks which key-value blocks actually deserve attention, and only runs the expensive math on those. The clever bit is that it does this on the real, uncompressed key-values, so you don't pay the long-context precision tax that something like DeepSeek's latent attention does. MiniMax claims a roughly 9x speedup on prefill and 15x on decode at 1M tokens, with quality holding steady in their ablations.

Why should you care about that more than the benchmark numbers? Because a quadratic-attention model can technically hold a million tokens, but actually using them is miserable. Prefill alone can take minutes. If MSA's speedups hold up under real load, that's the difference between "1M context exists on the spec sheet" and "1M context is something you'd actually build an agent around." That's the part that matters.

On pricing, it's aggressive. Standard pay-as-you-go is $0.60 per million input tokens and $2.40 per million output, with a 50% launch promo for the first week. That's somewhere between a tenth and a twentieth of what closed frontier models cost. You can run it right now through the MiniMax API, OpenRouter (OpenAI-compatible, easiest path), and a handful of launch partners.

One honest flag before we go further: at launch the parameter count is undisclosed, and the "open-weights" part is still a promise. The weights weren't on Hugging Face yet (MiniMax says "within 10 days"). So keep your enthusiasm calibrated. More on that later.

Putting MiniMax M3 Through My Usual Tests

Here's my process, which never changes, because that's the only way I can compare across releases instead of just vibing off first impressions. I run the same three tasks on every serious model: two website builds, a poker simulation terminal program, and a full code audit of my own site, thomas-wiegold.com. Same prompts, same expectations, every time.

Website one: the Sydney coffee roaster

This is one of those prompts I've run so many times I could recite the output styles in my sleep. Funny thing about it: every single model picks more or less the same color palette for a Sydney coffee roaster. GPT, Opus, Gemini, now MiniMax. There must be something deep in the training data that screams "warm browns and cream" the moment you say "coffee." I've stopped fighting it.

What separates the models is everything else, and M3 nailed everything else. The layout was clean and considered, the technical execution was solid, and honestly it was one of the best results I've gotten for this prompt to date. Right up there with the closed frontier models. That alone made me sit up, because this is exactly the kind of task where MiniMax used to be "fine, but."

Website two: the pop-culture online store

I push the complexity up here. More interactivity, more visual flair, more chances to fall apart. M3 handled it well. Nice animations, good structure, the sort of result you'd be happy to hand off as a starting point rather than a throwaway demo. Probably the second-best result I've ever gotten for this particular prompt.

Second-best, because Gemini still had a slight edge on the design polish. If you've read my take on Gemini 3.5 Flash in Google Antigravity, you'll know I rate Gemini's web design specifically while preferring...

MiniMax M3 Review: Matching GPT-5.5 and Opus?

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy