Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

TL;DR — Reasoning by editing, not regenerating. Reflective Masking turns a Mask Diffusion Model into a multi-turn reviser: it erases uncertain tokens, regenerates only what is needed, and remembers previous attempts.

Abstract

Recent diffusion language models — such as Google's DiffusionGemma — show that text generation need not be left-to-right: a model can refine a whole canvas using bidirectional context. We ask a complementary question: can existing Mask Diffusion Models (MDMs) be taught to reason by revising their own previous outputs? We propose Reflective Masking (RM) , a lightweight post-training method that turns masking into a model-driven decision — keep reliable tokens, re-mask uncertain ones, and reveal better replacements — making an MDM a multi-turn reviser rather than a one-shot decoder. To support multi-turn correction we add History Reference , a parameter-free memory that exposes the denoising trajectory to the model. Unlike a large pretrained diffusion LM, RM needs no architectural changes and no online rollouts, and drops into existing MDMs across Sudoku, text reasoning, and image editing — enabling sparse, iterative self-revision.

1Re-masking is the self-correction MDMs were missing. MDMs can edit in place but never choose to — so they lock in early mistakes. RM makes masking a model-driven decision (keep reliable tokens, re-mask uncertain ones, reveal better replacements), so the model fixes its own errors instead of carrying them forward.

2A lightweight post-training recipe — no new architecture. RM is activated by a scalable offline data pipeline (no online rollouts) and drops into existing MDMs unchanged — validated across text, Sudoku, and image editing.

3History Reference — a memory of past attempts, for free. A parameter-free mechanism that carries the denoising trajectory forward, so the model remembers what it already tried and stops repeating the same error.

CoT thinks by continuing. RM thinks by revising.

A diffusion-native analogue of chain-of-thought reflection.

Side-by-side: AR Reasoning vs. Reflective Masking Reasoning

AR reasoning / reflectionReflective Masking in MDMs

Generates thoughts left-to-rightRevises a full canvas bidirectionally Corrects mistakes by appending more text or regeneratingCorrects mistakes by re-masking only unreliable tokens Past mistakes remain in contextWrong tokens can be erased from the current state Test-time scaling = longer traces / more samplesTest-time scaling = more rounds of selective revision Memory is textual contextMemory is History Reference over denoising states

Results

Reasoning through explicit revision

Sudoku Image editing Text reasoning

Three task families, from instruction-rich image editing to open-ended text reasoning. Reflective Masking consistently beats masking-based baselines, and History Reference helps most where the model must explore on its own — all trained in about 5 hours on 2×H100 .

Sudoku — structured error correction

A tiny from-scratch MDM (0.81M params) recovers 9×9 boards with 4–20 corrupted cells by iterative re-masking. History Reference (HR) sharply cuts repeated mistakes and rule conflicts; adding History Embedding Rotation (HER) tops every metric.

Example 1 Example 2

Step 0 / 8 initial · corrupted

Errors remaining: 19 Re-masked: 0

↻ Restart &lsaquo; Prev ❚❚ Pause Next &rsaquo;

wrong digit re-masked just corrected

Reflective masking on Sudoku. Two real revision trajectories: the model re-masks cells it is unsure about (amber) and re-predicts them, turning wrong digits (red) into the correct solution until the board is valid — driving errors down to 0. Switch examples, press play, or step through manually.

Variant Exact Accuracy % ↑ Valid Rate % ↑ Replay Mistake % ↓ Conflict Cells /board ↓

RM (no History Reference) 82.486.60.570.578

RM + HR 91.4↑9.0 91.8↑5.2 0.07↓0.50 0.300↓0.278

RM + HR + decay 89.4↑7.0 89.6↑3.0 0.07↓0.50 0.362↓0.216

Ours — RM + HR + decay + HER 93.4↑11.0 93.6↑7.0 0.03↓0.54 0.236↓0.342

Quantitative results on Sudoku revision. Δ is the change versus the RM (no History Reference) baseline; bold marks the best value per column.

Relation to DiffusionGemma (Google). DiffusionGemma independently validates reasoning-by-revision on Sudoku: per its model card, exact-solve rises from 18% one-shot → 89.5% purely by revising over steps, and from 1.5% → 89.5% after fine-tuning a large pretrained model for 4,000 steps. Reflective Masking reaches an even higher 93.4% exact accuracy with a 0.81M-parameter MDM trained from scratch — orders of magnitude smaller than DiffusionGemma's fine-tuned backbone — and extends the same revision mechanism beyond text to image editing, a modality DiffusionGemma does not support.

DiffusionGemma: Google, “DiffusionGemma: 4× faster text...

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

Related Articles

Apple WWDC 2026 Livestream

Claude Fable 5

US Government directive to suspend access to Fable 5 and Mythos 5

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org