Cohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents - Firethering
back to top
Home
Softwares
AI Tools
DevTools
3D Tools
Design Tools
Image Editors
Video Editors
Productivity
Utilities
Apps
Android Apps
iOS Apps
Games
Windows Games
macOS Games
Android Games
iOS Games
Tech
Picks
AI Picks
AI Models
Trends
Search
Saturday, May 23, 2026
Home
Softwares
AI Tools
DevTools
3D Tools
Design Tools
Image Editors
Video Editors
Productivity
Utilities
Apps
Android Apps
iOS Apps
Games
Windows Games
macOS Games
Android Games
iOS Games
Tech
Picks
AI Picks
AI Models
Trends
Facebook<br>Instagram<br>Twitter<br>Vimeo<br>Youtube
Home
Softwares
AI Tools
DevTools
3D Tools
Design Tools
Image Editors
Video Editors
Productivity
Utilities
Apps
Android Apps
iOS Apps
Games
Windows Games
macOS Games
Android Games
iOS Games
Tech
Picks
AI Picks
AI Models
Trends
Search
HomeTechCohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents
Cohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents
By Mohit Geryani
May 23, 2026
Last updated: May 23, 2026
Share
- Advertisement -
Cohere spent the past year deploying North, its enterprise AI workspace, with actual customers doing actual work. Agentic question answering over company file systems. Data analysis across spreadsheets. Multi-session memory that has to hold up in production. Command A+ is what came out of that, a model shaped by a year of watching enterprise workflows break and figuring out why.
The result is a 218B mixture-of-experts model with 25B active parameters at inference time, available today on Hugging Face under Apache 2.0. It replaces five separate models in the Command A family, each of which handled one thing. This one handles all of them, and on most of the tasks those specialist models were built for, it wins.
Table of Contents
Five models became one
The Command A family going into this release was fragmented. Command A for general use, Reasoning for complex problem solving, Vision for multimodal, Translate for multilingual and tool use comes in separately. Five models with five sets of infrastructure to manage.
Command A+ consolidates all of it. One model, 48 language support up from 23, multimodal reasoning included, tool use built in, reasoning mode available. For an enterprise team managing private deployments that matters. Fewer models means fewer hardware configurations, fewer versioning headaches.
The consolidation only works if the unified model actually matches the specialists. On the agentic tasks that matter most for North, it doesn’t just match them. Agentic QA accuracy improved 20% over Command A Reasoning. Spreadsheet analysis quality improved 32%. Memory performance, testing whether the model can use context from a previous session to answer questions in a new one, jumped from 39% to 54%. They’re meaningful gains over the specialist it replaced.
Command A Plus
The efficiency numbers
218B total parameters sounds like a cluster problem. It isn’t, and that distinction is the whole point of the MoE architecture here.
In a dense model every parameter fires for every token. Command A+ activates 25B parameters at inference time and leaves the rest idle. The practical result is that it runs on two NVIDIA H100s at W4A4 quantization, or a single Blackwell GPU, with what Cohere describes as imperceptible quality difference versus the full precision version. For teams trying to deploy privately, on their own hardware, without routing sensitive data through an external API, that minimum spec changes the conversation.
Speed is also meaningfully better than its predecessor. Against Command A Reasoning at the same quantization and concurrency levels, Command A+ delivers up to 63% higher output tokens per second and cuts time to first token by up to 17%. The W4A4 quantization adds another 47% speed increase on top of that. Cohere also used speculative decoding optimized specifically for the MoE architecture, adding a further 1.5 to 1.6x inference speedup.
There’s also a new tokenizer. Command A+ is the first Cohere model to use it, and the compression gains matter especially for non-European languages, Arabic tokenization improved 20%, Korean 16%, Japanese 18%. Fewer tokens per response means lower inference cost per query, which compounds quickly at enterprise scale.
You May Like: ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters.
Where it’s genuinely strong
The benchmark Cohere is most confident about is the one that’s hardest to fake: τ²-Bench Telecom, which tests multi-step agentic task completion in realistic enterprise scenarios. Command A Reasoning scored 37% on it. Command A+ scores 85%. That’s not a incremental gain, that’s a different category of capability on the task the model was explicitly built for.
Terminal-Bench Hard went from 3% to 25%. That’s still...