Nvidia Nemotron 3 Super

NVIDIA Nemotron 3 Super - NVIDIA Nemotron

You are using an outdated browser. Please upgrade your browser to improve your experience.

NVIDIA

Models Super Tech Report Nemotron 3 Blog

We are releasing NVIDIA Nemotron 3 Super, a 12B active 120B total parameter Mixture-of-Experts hybrid Mamba-Transformer model. Nemotron 3 Super is part of the Nemotron 3 series of models , and is the first model in the series that:

Leverages LatentMoE for improved accuracy.

Includes MTP layers for faster inference through native speculative decoding.

Is Pretrained in NVFP4 .

Key Highlights

Nemotron 3 Super achieves upto 2.2x and 7.5x higher inference throughput than GPT-OSS-120B and Qwen3.5-122B , respectively, on the 8k token input / 64k token output setting.

Nemotron 3 Super achieves higher or comparable accuracies to GPT-OSS-120B and Qwen3.5-122B across a diverse set of benchmarks.

Supports context length of up to 1M tokens while outperforming both GPT-OSS-120B and Qwen3.5-122B on RULER at 1M context length.

Open Source

We are releasing the pre-trained, post-trained, and quantized checkpoints along with the datasets used for training.

Checkpoints:

Nemotron 3 Super 120B-A12B NVFP4 : post-trained and NVFP4 quantized model

Nemotron 3 Super 120B-A12B FP8 : post-trained and FP8 quantized model

Nemotron 3 Super 120B-A12B BF16 : post-trained model

Nemotron 3 Super 120B-A12B Base BF16 : base model

Qwen3-Nemotron-235B-A22B-GenRM-2603 : GenRM used for RLHF

Data:

Nemotron-Pretraining-Specialized-v1.1 : a collection of synthetic datasets aimed to improve LLM capabilities in code concepts and algorithms, formal logic, economics, and multiple choice questions.

Nemotron-Super-Post-Training-Data : a collection of RL environments and SFT datasets targeting a broad range of agentic capabilities.

Nvidia Nemotron 3 Super

Related Articles

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy

SpaceX not the behemoth everyone thought