Cohere's open agentic North Mini Code – accelerated with NVFP4 on spark-arena

deepseaOne1 pts0 comments

North Mini Code in NVFP4 — ~1.65x over FP8, 40% less memory, zero quality loss - DGX Spark / GB10 - NVIDIA Developer Forums

= 40rem)" rel="stylesheet" data-target="desktop" />

= 40rem)" rel="stylesheet" data-target="discourse-ai_desktop" /><br>= 40rem)" rel="stylesheet" data-target="discourse-calendar_desktop" /><br>= 40rem)" rel="stylesheet" data-target="poll_desktop" />

= 40rem)" rel="stylesheet" data-target="desktop_theme" data-theme-id="13" data-theme-name="discourse-nvidia-theme"/>

North Mini Code in NVFP4 — ~1.65x over FP8, 40% less memory, zero quality loss

Accelerated Computing

DGX Spark / GB10 User Forum

DGX Spark / GB10

agentic-ai

jeremyk

June 17, 2026, 4:26pm

Hey all,

I just put up two Spark Arena runs of North Mini Code 1.0 — an FP8 reference and an NVFP4 quant we made — to see what the GB10’s native FP4 support buys us. It’s Cohere’s first open agentic coding model: a 30B MoE (3B active), Apache 2.0, built for exactly the kind of run-it-yourself, sovereign setup the Spark is great for. Blog here: North Mini Code: Agentic Coding Model for Developers | Cohere

The results, same model / same recipe / same Spark, only the quant changed:

Single user @ 16K context (realistic): ~52 tok/s on NVFP4 vs ~32 on FP8 → ~1.65x faster

Two concurrent users: scales to ~84 tok/s aggregate (the Spark Arena figure)

Memory: 17 GB weights vs 28 GB → ~40% smaller footprint

Quality: identical HumanEval across NVFP4 and FP8 — no measurable loss

Benchmarks & Recipe:

FP8: CohereLabs/North-Mini-Code-1.0-fp8 - Spark Arena Benchmark

NVFP4: XanuNetworks/North-Mini-Code-1.0-NVFP4 - Spark Arena Benchmark

Both run on a single Spark (tensor parallel 1) under vLLM with FP8 KV cache, tool calling + reasoning via the cohere_command4 parsers. Recipes and full PP/TG-vs-concurrency logs are on both pages if you want to reproduce.

Fun side note: looks like this is the only Cohere model on the board so far, so a shout out to the Cohere folks for putting out such a solid little agentic coding model. Getting ~1.65x and a 40% smaller footprint for no quality hit makes it a really nice fit for the Spark.

Would love to hear how it runs on other people’s setups, and if anyone wants to stress the quant on heavier coding workloads than HumanEval, I’m all ears. Feedback welcome!

Cheers!

coder543

June 17, 2026, 5:23pm

I think any 4-bit quant can get those output tok/s benefits, since it is just memory bandwidth bound, and 4-bit models are about the same size.

I could be wrong, but I think real potential benefit of NVFP4 is more efficient use of the tensor cores for prefill (prompt processing). It would be interesting to see how many tokens/sec you’re getting for that.

Unfortunately, in my testing, North Mini Code just doesn’t seem to be good enough for me to have any great use for it yet, but I look forward to a future version 2.

jeremyk

June 17, 2026, 5:29pm

NVFP4 PP:

Screenshot 2026-06-17 132829956×408 37.2 KB

FP8 PP:

image824×412 33.2 KB

Related topics

Topic

Replies<br>Views<br>Activity

FP4 on DGX Spark — Why It Doesn't Scale Like You'd Expect

DGX Spark / GB10

213

6488

March 13, 2026

MiniMax M2.7 NFVP4 Recipe & Benchmarks

DGX Spark / GB10

llama

123

11498

May 19, 2026

Qwen3-Next AWQ 4bit vs FP8 vs NVFP4 on single spark

DGX Spark / GB10

jetson<br>llama<br>nemotron

2479

February 23, 2026

NVFP4 quantization of a 100B-class Llama on 2× DGX Spark — lessons + open questions

DGX Spark / GB10

llama

383

May 15, 2026

PSA: State of FP4/NVFP4 Support for DGX Spark in VLLM

DGX Spark / GB10

234

12833

May 15, 2026

NVIDIA folks -- where is this promised nvfp4 speedup?

DGX Spark / GB10

27

2813

March 26, 2026

We unlocked NVFP4 on the DGX Spark: 20% faster than AWQ!

DGX Spark / GB10

144

8641

March 14, 2026

Best Q4 / NVFP4 model for quality Qwen3.5-27B or alternatives?

DGX Spark / GB10

llama<br>deepseek<br>nemotron

16

3732

April 26, 2026

Can someone with 2 Sparks benchmark NVFP4 MiniMax M2.1 quant?

DGX Spark / GB10

24

1520

January 15, 2026

MiniMax 2.5 REAP - NVFP4 on single DGX Spark

DGX Spark / GB10

25

3167

April 1, 2026

Powered by Discourse, best viewed with JavaScript enabled

spark nvfp4 gb10 north mini code

Related Articles