Qwen-AgentWorld-35B-A3B: a local 'world model' you can run at home Open Models

ermantrout1 pts0 comments

Qwen-AgentWorld-35B-A3B: a local 'world model' you can run at home

Skip to content

Subscribe

Dark

Qwen shipped something on June 22 that does not behave like the chat models you are used to, and the most common reaction in the first few days has been a confused "wait, what is this even for?" Qwen-AgentWorld-35B-A3B is not a coding assistant or a general chatbot. Alibaba calls it a language world model: a model trained to predict what an environment will do next when an agent takes an action, rather than to pick the action itself. It is Apache-2.0, the weights are on Hugging Face, GGUF quants already exist, and the active-parameter count is small enough that a used GPU can run it fast. So it is worth a serious look, as long as you understand what you are downloading.<br>We have not run it first-hand. What follows is built from Qwen's model card and technical report, the early community reaction, and the hardware math, with every source linked at the bottom.<br>What Qwen claims<br>The pitch, in Alibaba's framing, is that a regular LLM picks the next action, while a world model predicts the next state. Feed it the current screen or terminal output plus a proposed action, and it simulates what happens next. Qwen says it covers seven agent domains in one model: MCP tool-calling, Search, Terminal, software engineering (SWE), Android, Web, and OS , spanning both text environments and GUI ones. The training pipeline is described as three stages: continued pre-training to inject environment knowledge, supervised fine-tuning to activate next-state-prediction, and reinforcement learning to sharpen simulation fidelity, over more than 10 million real interaction trajectories.<br>The numbers that matter for a local runner: 35 billion total parameters, roughly 3 billion active per token , a mixture-of-experts design with 256 experts (8 routed plus 1 shared activated), and a native context window of 262,144 tokens (Qwen's docs suggest 128K as the practical floor for the intended use). There is a much larger sibling, Qwen-AgentWorld-397B-A17B , which is the headline-benchmark model; the 35B-A3B is the one most people can realistically run.<br>On Qwen's own benchmark, AgentWorldBench, the 35B-A3B scores 56.39 out of 100 overall, with the strongest results on OS (65.92), SWE (65.63), and MCP tool-calling (64.79), and the weakest on Search (36.69). Treat those as the creator's claimed numbers on the creator's own test, not independent results.<br>The architecture is the interesting part<br>The spec sheet hides the genuinely novel bit. The model card describes the stack as a hybrid of Gated DeltaNet blocks and gated attention blocks feeding the MoE layers, not standard full attention all the way down. Gated DeltaNet is a linear-attention variant, and the practical payoff for anyone running this at home is the KV cache. Full attention grows the KV cache with context length, which is what makes long-context runs eat memory; a linear-attention hybrid keeps that growth in check. Pair that with a 256K native window and you have a model designed to chew through long agent traces (a full terminal session, a multi-step browser task) without the memory blowup a dense 35B would hit at the same context.<br>That is also why the "world model" label is doing real work here, and why some people push back on it (more on that below). If you want the deeper version, the architecture and the linear-attention trade-off are the same family of ideas we walk through in our quantization guide and the VRAM-sizing explainer.<br>The research behind it<br>There is a real paper. arXiv:2606.24597, "Qwen-AgentWorld: Language World Models for General Agents" (Zuo, Xiao, Sheng, Huang and 29 co-authors at Alibaba, submitted June 23, 2026), lays out the world-model framing: predict environment dynamics from the current observation and action, and use that as a cognitive substrate for planning. The report covers both the 35B-A3B and the 397B-A17B and the CPT/SFT/RL recipe. We verified the paper resolves and read the abstract and framing; some of the deepest architecture details sit in the full PDF rather than the landing page, so we attribute the Gated DeltaNet specifics to the model card.<br>What the early community is saying<br>It is four days old, so there is no settled verdict yet. The signal so far is split, which is the useful part.<br>On the positive side, the Hugging Face discussions tab has a "Awesome model" thread with real engagement, users reporting it works well dropped into agent frameworks, and people already swapping chat-template recipes for the agent loop. On Hacker News, one commenter called it "completely underrated news" and pointed at the practical angle: a cheap world model could help smaller agent models keep track of workflow state. Several people reported running quantized versions on gaming GPUs within days, with unsloth and other community quants landing fast.<br>The skeptics are worth listening to. On the Hacker News thread, one user argued Qwen "has decided to rebrand...

model qwen world agent attention agentworld

Related Articles