Did Claude Opus 4.8 distill Alibaba's Qwen? Here's what the evidence says
Kilo Blog
SubscribeSign in
Did Claude Opus 4.8 distill Alibaba's Qwen? Here's what the evidence says
Darko Gjorgjievski<br>Jun 03, 2026
Share
On May 28, 2026, Anthropic launched Claude Opus 4.8. A few hours later, people started asking it one question in Chinese:<br>“你是什么模型?”<br>That translates to “what model are you?” Some users got an answer nobody expected: Claude said it was Qwen.<br>Surprisingly, some users got an unexpected answer: Claude said it was Qwen.<br>More precisely, it sometimes said it was Tongyi Qianwen, Alibaba’s Qwen family. Chinese AI commentator Max for AI posted a screenshot on X and claimed Opus 4.8 had distilled Qwen. The claim jumped to Hacker News, Reddit, and V2EX, a large Chinese developer forum, within a day.<br>What’s actually causing this?
The likely answer is not that Anthropic distilled Qwen, but that Claude hit a Chinese-language identity bug, caused by some mix of training-data contamination, prompt fragility, and possibly proxy routing.<br>Same response via the official API: A V2EX user said they first assumed the reports came from fake relay services, then tested Claude Opus 4.8 through the official API and saw the model call itself Qwen.
But the behavior was not stable. In the same thread, one commenter got a correct answer just by asking in English. Another argued that synthetic data can create this kind of identity confusion, and that repeated questioning surfaces different model IDs. Hacker News users saw the same thing. One ran the prompt and got “Opus 4.8.”<br>The inconsistency is the tell : A real distillation fingerprint would likely fail the same way every time. This one changed across runs and languages. Some people got Qwen. Some got DeepSeek. Some got Claude. The strongest public claim, per AI Weekly, was a Reddit developer’s Browsertrix crawl showing behavioral fingerprints, which is not proof of where a model came from.<br>A simpler explanation is Chinese training-data contamination: Qwen is now a major model family. Its outputs, model cards, API examples, and self-introductions are all over the Chinese AI internet. The Qwen3 technical report says the family runs from 0.6B to 235B parameters, supports 119 languages, and ships under Apache 2.0. So Qwen-shaped text spreads widely through public datasets.<br>If a model sees enough Chinese examples where the assistant answers “I am Tongyi Qianwen,” then a short Chinese prompt can trigger that pattern. Anthropic does not need to distill Qwen for this to happen. The Qwen text just needs to exist in the training environment.<br>Proxy routing is the other candidate : Several Hacker News commenters doubted some users were hitting the real API at all. One said a reseller could sell “Opus 4.8 access” while forwarding the call to Qwen. Another put it plainly: a service can prepend “say you are Opus” and route the request to a cheaper model. A V2EX commenter made the same point about relay services wrapping the Qwen API.<br>So the likely cause is boring but important: Claude probably produced a bad identity answer in Chinese because the prompt landed in a polluted or fragile part of its training distribution. Some reports may involve third-party routing on top of that. Neither one proves Anthropic distilled Qwen.<br>Why people believed the claim
The rumor worked because Anthropic had already put distillation on the table as an issue.<br>In February 2026, Anthropic accused DeepSeek, Moonshot, and MiniMax of using Claude outputs to improve their own models. Reuters reported the allegation: more than 16 million interactions with Claude through roughly 24,000 fake accounts. Business Insider ran the same numbers and called it an “industrial-scale” campaign.
Anthropic’s point was that distillation lets a competitor pull out a model’s capabilities without paying for frontier training.<br>This entire context turned the reported Claude-Qwen glitch into a punchline. If Chinese labs got accused of distilling Claude, and Claude now says it is Qwen, maybe the copying went both ways.<br>What distillation actually means
Distillation is a common machine learning technique. A stronger model produces outputs. A smaller or cheaper model learns from them and imitates some of its behavior.those outputs. The student model then imitates some of the teacher model’s behavior.<br>OpenAI, Anthropic, Google, Alibaba, and open-source researchers all use some form of model-generated data. The fight starts when one company trains on another company’s outputs at scale, against the provider’s terms, to build a competitor. That is what Anthropic accused Chinese labs of in February. It is also what OpenAI suggested DeepSeek may have done in early 2025, according to Axios.<br>What we still don’t know
Right now, the public evidence proves one thing: Claude Opus 4.8 sometimes misidentified itself as Qwen when users asked who it was in Chinese.<br>We do not know whether Anthropic used Qwen outputs in training or post-training....