Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs

XYZ Labs

SubscribeSign in

Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs LongCat-2.0 is not the world's strongest model. The shock is that Meituan says the full training and deployment ran on domestic AI ASIC superpods.

XYZ Labs Jul 02, 2026

Meituan just released LongCat-2.0 , a 1.6-trillion-parameter foundation model. But the real story is not the parameter count. The real story is this: according to the Chinese analysis and Meituan’s own model-card language, LongCat-2.0 was trained and deployed on AI ASIC superpods , without relying on Nvidia GPUs. For China’s AI industry, that is the line that turns a normal model launch into a geopolitical hardware story. LongCat-2.0 uses a Mixture-of-Experts architecture with 1.6T total parameters and about 48B activated parameters per token . Before its official release, it reportedly appeared on OpenRouter under the anonymous name Owl Alpha , where it entered the top three by total usage. In Claude Code Agent scenarios, the article says LongCat-2.0 ranked second globally by usage, behind only Claude Opus 4.8.

If you only look at model performance, LongCat-2.0 is not a clean “best in the world” story. The technical community generally sees its agentic capability as close to Claude Opus 4.6, but behind the newer Claude Opus 4.8. It is also not necessarily China’s strongest coding model; the source article says community feedback places it slightly above GLM-5.1 in coding, but behind GLM-5.2. It is not even the highest-usage Chinese model on OpenRouter, because free or heavily subsidized models from Tencent, Alibaba, and DeepSeek have also appeared near the top. But add one qualifier and the whole story changes: LongCat-2.0 is a trillion-parameter-class model with “zero Nvidia content” in the training-to-inference loop. The article’s core claim is that from training to inference deployment, the model ran on a domestic compute cluster. Meituan’s public model card phrases this more carefully: both the full training run and large-scale deployment were built entirely on AI ASIC superpods , with pretraining across more than 35 trillion tokens and no rollbacks or irrecoverable loss spikes. That matters because China’s previous domestic-compute narratives often covered narrower milestones: running inference on local chips, or doing post-training on local chips. LongCat-2.0 is presented as something more ambitious: a full trillion-parameter training-and-serving pipeline. One developer quoted in the Chinese article put it neatly: earlier efforts were like building a house elsewhere, then using domestic compute to decorate it. LongCat-2.0 is more like laying the foundation, building the house, moving in, and finding that it is actually livable. Why “From Scratch” Matters

The article is careful about the hardware details, and that caution is important. Meituan’s official material says “domestic AI compute chips” and “AI ASIC superpod.” It does not publicly name the exact chip model, nor does it officially state the total number of cards. The widely repeated “50,000 Ascend 910C cards” figure comes from Chinese media reports and community inference. Other reports use vaguer language such as “ten-thousand-card scale,” while some put the range around 50,000 to 60,000 cards. The Ascend 910C identification is also a community inference based on clues such as 200Gbps RDMA and 64GB HBM per die , not a Meituan or Huawei confirmation in the LongCat context. So the precise wording should be: LongCat-2.0 appears to have been trained on a large domestic AI ASIC cluster, widely reported as roughly 50,000 cards and widely inferred to involve Huawei Ascend 910C-class hardware. That is still a big deal.

The source article draws a sharp distinction between two types of breakthroughs. Earlier domestic-chip milestones often involved taking an existing large model and doing continued training or full-parameter post-training. That is difficult and valuable, but it is not the same as training a trillion-parameter model from random initialization. Meituan’s claim is much harder: starting from zero, training a 1.6T-parameter model on more than 30T tokens , reducing daily failure rates by more than 70% , improving training MFU by 1.5x , and completing the process without rollback or unrecoverable loss spikes. From-scratch pretraining is brutally unforgiving. A loss spike, communication timeout, or silent data corruption event can waste millions in electricity and compute time. Doing it on a non-Nvidia stack means the challenge is not only raw FLOPs. It is the whole system: chips, interconnect, operators, communication libraries, fault recovery, monitoring, and training stability. That is why the article argues the key question has shifted from “can Chinese chips train a giant model?” to “can they do it stably enough to become normal?” The Real Bottleneck Is the Software Stack

The article does not pretend domestic...

Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs

Related Articles

(no title)

Scientists reverse brain aging, with a nasal spray

AI has torched the market for junior programmers

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org