DiffusionGemma

tomas7892 pts0 comments

DiffusionGemma — Google DeepMindSkip to main content

Google DeepMind DeepMind

Build with Gemini<br>Try Gemini

DiffusionGemma<br>An experimental open model that explores an exceptionally fast approach to text generation

Read blog

Download

DiffusionGemma abandons the sequential, token-by-token process of typical autoregressive Large Language Models.

Built on Gemma 4 and Gemini Diffusion research, it prioritizes unprecedented speed and parallel layout generation, unlocking novel workflows for developers building real-time interactive AI applications.

Read developer guide

Your browser does not support the video tag. Your browser does not support the video tag.

A non-sequential transformer that generates entire paragraphs rather than individual, next-token guesses, ensuring global logical consistency

Slide 1 of 5

Blazing fast inference<br>By shifting the decode bottleneck from memory-bandwidth to raw compute, DiffusionGemma generates up to 4x-5x faster token output on NVIDIA GPUs (achieving over 1,000 tokens per second on a single H100).

Accessible hardware footprint<br>Operates as a 26B total Mixture of Experts (MoE) model that activates only 3.8B parameters during inference. It fits comfortably within the 24GB VRAM limits of a consumer NVIDIA RTX 5090 or 4090 when quantized.

Bi-directional attention<br>Generating 256 tokens in parallel with each forward pass allows every token to attend to all others. This provides significant advantages for non-linear domains such as in-line editing and code infilling.

Intelligent self-correction<br>Extract The model iteratively refines its own output, allowing it to evaluate the entire text block at once to perfectly close complex formatting and fix mistakes in real-time. data from medical lab reports

Next-gen compute with NVFP4<br>Native support for NVIDIA's new NVFP4 (4-bit floating-point) format on Blackwell GPUs dramatically accelerates compute throughput, allowing the model to run at faster speeds with near-lossless accuracy.

Download DiffusionGemma

Download from Hugging Face

Download from Kaggle

Access on Model Garden

diffusiongemma model token download from gemini

Related Articles