Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs

mgh21 pts0 comments

Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs

XYZ Labs

SubscribeSign in

Meituan Trained a 1.6T-Parameter AI Model Without Nvidia GPUs<br>LongCat-2.0 is not the world's strongest model. The shock is that Meituan says the full training and deployment ran on domestic AI ASIC superpods.

XYZ Labs<br>Jul 02, 2026

Share

Meituan just released LongCat-2.0 , a 1.6-trillion-parameter foundation model. But the real story is not the parameter count.<br>The real story is this: according to the Chinese analysis and Meituan’s own model-card language, LongCat-2.0 was trained and deployed on AI ASIC superpods , without relying on Nvidia GPUs. For China’s AI industry, that is the line that turns a normal model launch into a geopolitical hardware story.<br>LongCat-2.0 uses a Mixture-of-Experts architecture with 1.6T total parameters and about 48B activated parameters per token . Before its official release, it reportedly appeared on OpenRouter under the anonymous name Owl Alpha , where it entered the top three by total usage. In Claude Code Agent scenarios, the article says LongCat-2.0 ranked second globally by usage, behind only Claude Opus 4.8.

If you only look at model performance, LongCat-2.0 is not a clean “best in the world” story.<br>The technical community generally sees its agentic capability as close to Claude Opus 4.6, but behind the newer Claude Opus 4.8. It is also not necessarily China’s strongest coding model; the source article says community feedback places it slightly above GLM-5.1 in coding, but behind GLM-5.2.<br>It is not even the highest-usage Chinese model on OpenRouter, because free or heavily subsidized models from Tencent, Alibaba, and DeepSeek have also appeared near the top.<br>But add one qualifier and the whole story changes:<br>LongCat-2.0 is a trillion-parameter-class model with “zero Nvidia content” in the training-to-inference loop.<br>The article’s core claim is that from training to inference deployment, the model ran on a domestic compute cluster. Meituan’s public model card phrases this more carefully: both the full training run and large-scale deployment were built entirely on AI ASIC superpods , with pretraining across more than 35 trillion tokens and no rollbacks or irrecoverable loss spikes.<br>That matters because China’s previous domestic-compute narratives often covered narrower milestones: running inference on local chips, or doing post-training on local chips. LongCat-2.0 is presented as something more ambitious: a full trillion-parameter training-and-serving pipeline.<br>One developer quoted in the Chinese article put it neatly: earlier efforts were like building a house elsewhere, then using domestic compute to decorate it. LongCat-2.0 is more like laying the foundation, building the house, moving in, and finding that it is actually livable.<br>Why “From Scratch” Matters

The article is careful about the hardware details, and that caution is important.<br>Meituan’s official material says “domestic AI compute chips” and “AI ASIC superpod.” It does not publicly name the exact chip model, nor does it officially state the total number of cards.<br>The widely repeated “50,000 Ascend 910C cards” figure comes from Chinese media reports and community inference. Other reports use vaguer language such as “ten-thousand-card scale,” while some put the range around 50,000 to 60,000 cards. The Ascend 910C identification is also a community inference based on clues such as 200Gbps RDMA and 64GB HBM per die , not a Meituan or Huawei confirmation in the LongCat context.<br>So the precise wording should be: LongCat-2.0 appears to have been trained on a large domestic AI ASIC cluster, widely reported as roughly 50,000 cards and widely inferred to involve Huawei Ascend 910C-class hardware.<br>That is still a big deal.

The source article draws a sharp distinction between two types of breakthroughs.<br>Earlier domestic-chip milestones often involved taking an existing large model and doing continued training or full-parameter post-training. That is difficult and valuable, but it is not the same as training a trillion-parameter model from random initialization.<br>Meituan’s claim is much harder: starting from zero, training a 1.6T-parameter model on more than 30T tokens , reducing daily failure rates by more than 70% , improving training MFU by 1.5x , and completing the process without rollback or unrecoverable loss spikes.<br>From-scratch pretraining is brutally unforgiving. A loss spike, communication timeout, or silent data corruption event can waste millions in electricity and compute time. Doing it on a non-Nvidia stack means the challenge is not only raw FLOPs. It is the whole system: chips, interconnect, operators, communication libraries, fault recovery, monitoring, and training stability.<br>That is why the article argues the key question has shifted from “can Chinese chips train a giant model?” to “can they do it stably enough to become normal?”<br>The Real Bottleneck Is the Software Stack

The article does not pretend domestic...

model training longcat meituan parameter domestic

Related Articles