DiffusionGemma — Google DeepMindSkip to main content
Google DeepMind DeepMind
Build with Gemini<br>Try Gemini
DiffusionGemma<br>An experimental open model that explores an exceptionally fast approach to text generation
Read blog
Download
DiffusionGemma abandons the sequential, token-by-token process of typical autoregressive Large Language Models.
Built on Gemma 4 and Gemini Diffusion research, it prioritizes unprecedented speed and parallel layout generation, unlocking novel workflows for developers building real-time interactive AI applications.
Read developer guide
Your browser does not support the video tag. Your browser does not support the video tag.
A non-sequential transformer that generates entire paragraphs rather than individual, next-token guesses, ensuring global logical consistency
Slide 1 of 5
Blazing fast inference<br>By shifting the decode bottleneck from memory-bandwidth to raw compute, DiffusionGemma generates up to 4x-5x faster token output on NVIDIA GPUs (achieving over 1,000 tokens per second on a single H100).
Accessible hardware footprint<br>Operates as a 26B total Mixture of Experts (MoE) model that activates only 3.8B parameters during inference. It fits comfortably within the 24GB VRAM limits of a consumer NVIDIA RTX 5090 or 4090 when quantized.
Bi-directional attention<br>Generating 256 tokens in parallel with each forward pass allows every token to attend to all others. This provides significant advantages for non-linear domains such as in-line editing and code infilling.
Intelligent self-correction<br>Extract The model iteratively refines its own output, allowing it to evaluate the entire text block at once to perfectly close complex formatting and fix mistakes in real-time. data from medical lab reports
Next-gen compute with NVFP4<br>Native support for NVIDIA's new NVFP4 (4-bit floating-point) format on Blackwell GPUs dramatically accelerates compute throughput, allowing the model to run at faster speeds with near-lossless accuracy.
Download DiffusionGemma
Download from Hugging Face
Download from Kaggle
Access on Model Garden