Mistral vs. Claude on our onboarding: 4× faster, 30% cheaper | Squidler<br>← BlogOnboarding · new hireHi, I'm your new agent. What are you hoping to hand off?
Writing. A blog post about swapping our AI models for European ones.
Got it. One-off, or the start of a series you'll regret announcing?
…the second one.
Understood. I'll need the benchmarks, the eval numbers, and your tolerance for em dashes.
Zero. It is exactly zero.
We're based in Malmö, and we've always wanted a European tech stack. It's part of why we run on Hetzner, for one. The goal is an all-European stack , and we'll push for it. But the drive to build an awesome product is equally strong, so where a European option can't yet carry the job, we'll take the best model available and make sure it runs under EU legislation. We are not even an option for many of our potential customers if we don't handle their data under EU rules.
But with AI, most of the frontier is American, and our early investigations landed us on Anthropic's Claude for most of the models in our stack. It has taken us to where we are today with extreme speed, but now it's time to slow down and see how we can reach that all-European goal.
The first candidate
The first conversation you have when you hire a Squidler specialist is driven by an onboarding agent that works out what you actually need. As it runs fairly isolated from the rest of the system, it feels like a good candidate to start with.
Onboarding runs on Claude Sonnet 4.6 today, as it strikes a good balance between quality, speed, and price for the task. Initially we also evaluated Haiku 4.5, Anthropic's cheaper tier, but didn't get the quality we wanted. To find a European model that could stand next to it, we pulled the published benchmarks (Artificial Analysis for capability and speed, the vendors for price). One candidate stood out.
Metric<br>Mistral Medium 3.5 🇫🇷<br>Claude Sonnet 4.6 🇺🇸<br>Claude Haiku 4.5 🇺🇸
Intelligence Index<br>30<br>36<br>24
Output speed<br>92 tok/s<br>44.8 tok/s<br>92 tok/s
Time to first token<br>2.06s<br>1.26s<br>0.93s
Price (in / out per 1M)<br>$1.50 / $7.50<br>$3 / $15<br>$1 / $5
Numbers as of July 2, 2026, for each vendor's own API. They drift as new benchmark runs land.
Mistral Medium 3.5 sits six points behind Sonnet on intelligence, at half the price. Its time to first token, the wait before anything shows on screen, is a bit slower than Sonnet's, but its output speed matches Haiku's, twice Sonnet's, so it should produce the complete reply quicker. On paper, Mistral is the obvious one to test: created in France, and served through their own API with data hosted in the EU. The all-European option, if the quality holds.
The test
We don't judge models on vibes. Onboarding has its own eval suite: 17 scripted conversation fixtures, each run three times, scored with promptfoo on the things that matter. Does the agent surface a concrete problem, hand off cleanly, handle multiple languages, and stay grounded instead of making things up?
Result<br>Mistral Medium 3.5 🇫🇷<br>Claude Sonnet 4.6 🇺🇸<br>Claude Haiku 4.5 🇺🇸
Quality (checks passed)<br>46 / 51 (90%)<br>48 / 51 (94%)<br>39 / 51 (76%)
Speed (median per turn)<br>~1.9s<br>~7.6s<br>~1.9s
Cost (per session)<br>~$0.044<br>~$0.063<br>~$0.035
Each check is one fixture run graded pass/fail as a whole, the subjective ones by a Claude model, a bias that favors the incumbent if anything. A two-check gap on 51 runs is within run-to-run noise. Speed and cost come from replayed eight-turn sessions per model.
On quality, Mistral Medium 3.5 came out almost as good as Sonnet in our tests, in line with the published numbers. The surprise is speed: about 4× as quick per turn , double what the spec sheet promised. Pulling the raw replies from the eval explains it: once the conversation gets going, Mistral averaged roughly 170 output tokens per turn where Sonnet used about 500 . Same job, a third of the words. As in the example below:
Eval fixture · email-vague · one of 17Hey! Shall we talk in English, Swedish, or another language?
English
Anything on your mind you'd like help with?
Honestly I'm drowning in email. Half of it is junk but the important stuff is buried in there somewhere and I keep missing things.
The next reply is the model's, same conversation for both<br>Claude Sonnet 4.642words · passes 3/3 runs<br>That's a real problem — missing something important because it's buried is stressful. Is the main pain that you're missing things you need to act on(reply, follow up, do something), or more that you can't find information when you need it later?
Mistral Medium 3.514words · passes 3/3 runs<br>That sounds like a real pain. Do you use Gmail or another email service?
Some readers will prefer Sonnet's fuller reply here, and that's fair. Both pass the same checks.
Worth noting: the per-turn figure is total time, and felt latency is not the same as measured latency. What you feel first is time to first token. The spec sheet has Mistral slower to the first word, but we never felt it: a median Mistral...