DiffusionGemma

DiffusionGemma — Google DeepMindSkip to main content

Google DeepMind DeepMind

Build with Gemini Try Gemini

DiffusionGemma An experimental open model that explores an exceptionally fast approach to text generation

Read blog

Download

DiffusionGemma abandons the sequential, token-by-token process of typical autoregressive Large Language Models.

Built on Gemma 4 and Gemini Diffusion research, it prioritizes unprecedented speed and parallel layout generation, unlocking novel workflows for developers building real-time interactive AI applications.

Read developer guide

Your browser does not support the video tag. Your browser does not support the video tag.

A non-sequential transformer that generates entire paragraphs rather than individual, next-token guesses, ensuring global logical consistency

Slide 1 of 5

Blazing fast inference By shifting the decode bottleneck from memory-bandwidth to raw compute, DiffusionGemma generates up to 4x-5x faster token output on NVIDIA GPUs (achieving over 1,000 tokens per second on a single H100).

Accessible hardware footprint Operates as a 26B total Mixture of Experts (MoE) model that activates only 3.8B parameters during inference. It fits comfortably within the 24GB VRAM limits of a consumer NVIDIA RTX 5090 or 4090 when quantized.

Bi-directional attention Generating 256 tokens in parallel with each forward pass allows every token to attend to all others. This provides significant advantages for non-linear domains such as in-line editing and code infilling.

Intelligent self-correction Extract The model iteratively refines its own output, allowing it to evaluate the entire text block at once to perfectly close complex formatting and fix mistakes in real-time. data from medical lab reports

Next-gen compute with NVFP4 Native support for NVIDIA's new NVFP4 (4-bit floating-point) format on Blackwell GPUs dramatically accelerates compute throughput, allowing the model to run at faster speeds with near-lossless accuracy.

Download DiffusionGemma

Download from Hugging Face

Download from Kaggle

Access on Model Garden

DiffusionGemma

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs