Cohere Open-Sources Command A+, a 218B Moe Model That Runs on Two H100s

Cohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents - Firethering

Home

Softwares

AI Tools

DevTools

3D Tools

Design Tools

Image Editors

Video Editors

Productivity

Utilities

Apps

Android Apps

iOS Apps

Games

Windows Games

macOS Games

Android Games

iOS Games

Tech

Picks

AI Picks

AI Models

Trends

Saturday, May 23, 2026

Home

Softwares

AI Tools

DevTools

3D Tools

Design Tools

Image Editors

Video Editors

Productivity

Utilities

Apps

Android Apps

iOS Apps

Games

Windows Games

macOS Games

Android Games

iOS Games

Tech

Picks

AI Picks

AI Models

Trends

Facebook<br>Instagram<br>Twitter<br>Vimeo<br>Youtube

Home

Softwares

AI Tools

DevTools

3D Tools

Design Tools

Image Editors

Video Editors

Productivity

Utilities

Apps

Android Apps

iOS Apps

Games

Windows Games

macOS Games

Android Games

iOS Games

Tech

Picks

AI Picks

AI Models

Trends

HomeTechCohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents

Cohere Open-Sourced Command A+, a 218B MoE Model Built for Enterprise Agents

By Mohit Geryani

May 23, 2026

Last updated: May 23, 2026

Facebook

Twitter

- Advertisement -

Cohere spent the past year deploying North, its enterprise AI workspace, with actual customers doing actual work. Agentic question answering over company file systems. Data analysis across spreadsheets. Multi-session memory that has to hold up in production. Command A+ is what came out of that, a model shaped by a year of watching enterprise workflows break and figuring out why.

The result is a 218B mixture-of-experts model with 25B active parameters at inference time, available today on Hugging Face under Apache 2.0. It replaces five separate models in the Command A family, each of which handled one thing. This one handles all of them, and on most of the tasks those specialist models were built for, it wins.

Table of Contents

Five models became one

The Command A family going into this release was fragmented. Command A for general use, Reasoning for complex problem solving, Vision for multimodal, Translate for multilingual and tool use comes in separately. Five models with five sets of infrastructure to manage.

Command A+ consolidates all of it. One model, 48 language support up from 23, multimodal reasoning included, tool use built in, reasoning mode available. For an enterprise team managing private deployments that matters. Fewer models means fewer hardware configurations, fewer versioning headaches.

The consolidation only works if the unified model actually matches the specialists. On the agentic tasks that matter most for North, it doesn’t just match them. Agentic QA accuracy improved 20% over Command A Reasoning. Spreadsheet analysis quality improved 32%. Memory performance, testing whether the model can use context from a previous session to answer questions in a new one, jumped from 39% to 54%. They’re meaningful gains over the specialist it replaced.

Command A Plus

The efficiency numbers

218B total parameters sounds like a cluster problem. It isn’t, and that distinction is the whole point of the MoE architecture here.

In a dense model every parameter fires for every token. Command A+ activates 25B parameters at inference time and leaves the rest idle. The practical result is that it runs on two NVIDIA H100s at W4A4 quantization, or a single Blackwell GPU, with what Cohere describes as imperceptible quality difference versus the full precision version. For teams trying to deploy privately, on their own hardware, without routing sensitive data through an external API, that minimum spec changes the conversation.

Speed is also meaningfully better than its predecessor. Against Command A Reasoning at the same quantization and concurrency levels, Command A+ delivers up to 63% higher output tokens per second and cuts time to first token by up to 17%. The W4A4 quantization adds another 47% speed increase on top of that. Cohere also used speculative decoding optimized specifically for the MoE architecture, adding a further 1.5 to 1.6x inference speedup.

There’s also a new tokenizer. Command A+ is the first Cohere model to use it, and the compression gains matter especially for non-European languages, Arabic tokenization improved 20%, Korean 16%, Japanese 18%. Fewer tokens per response means lower inference cost per query, which compounds quickly at enterprise scale.

You May Like: ZAYA1-8B Matches DeepSeek-R1 on Math with Less Than 1B Active Parameters.

Where it’s genuinely strong

The benchmark Cohere is most confident about is the one that’s hardest to fake: τ²-Bench Telecom, which tests multi-step agentic task completion in realistic enterprise scenarios. Command A Reasoning scored 37% on it. Command A+ scores 85%. That’s not a incremental gain, that’s a different category of capability on the task the model was explicitly built for.

Terminal-Bench Hard went from 3% to 25%. That’s still...

Cohere Open-Sources Command A+, a 218B Moe Model That Runs on Two H100s

Related Articles

Amazon, Facebook, FBI have access to a private intelligence-sharing network

SpaceX not the behemoth everyone thought

Elevated error rates on requests to multiple models

Donald Trump and sons to be 'forever' exempt from tax audits

PopuLoRA: Co-Evolving LLM Populations for Reasoning Self- Play