Feedback Alignment in Self-Distillation

[2606.11173] The Role of Feedback Alignment in Self-Distillation

-->

Computer Science > Artificial Intelligence

arXiv:2606.11173 (cs)

[Submitted on 9 Jun 2026]

Title:The Role of Feedback Alignment in Self-Distillation

Authors:Semih Kara, Oğuzhan Ersoy View a PDF of the paper titled The Role of Feedback Alignment in Self-Distillation, by Semih Kara and O\u{g}uzhan Ersoy

View PDF HTML (experimental)

Abstract:Conditioning a language model on additional context, such as feedback on a previous attempt, typically improves its response. Self-distillation trains the model to retain this improvement when the context is not present. The method works by matching the model's output distribution under two settings: a student that sees only the question, and a self-teacher that also sees the context. What the model learns therefore depends on what context the self-teacher receives, yet the design of this context remains largely unexplored.

We study context design for self-distillation by training a solver on feedback from a frozen critic. We compare three conditions: (i) a binary reward (GRPO), (ii) the reference solution, and (iii) a step-by-step critique aligned to the solver's reasoning trace.

Step-aligned critique yields the largest gains, outperforming GRPO by 16.11 points and reference-solution-conditioned self-distillation by 5.27 points (Avg@12). Per-token advantage analysis reveals why: step-aligned feedback targets only the tokens where reasoning fails, leaving correct behavior intact. Conditioning on the reference solution, by contrast, pressures the model to change its behavior at every token (even correct steps) because an alternative derivation inevitably differs in phrasing and approach. This suggests that structural alignment between feedback and the solver's reasoning is a key driver of self-distillation effectiveness.

Comments: Accepted to the ICML 2026 Workshop on RL from World Feedback (RLxF)

Subjects:

Artificial Intelligence (cs.AI); Machine Learning (cs.LG)

Cite as: arXiv:2606.11173 [cs.AI]

(or arXiv:2606.11173v1 [cs.AI] for this version)

https://doi.org/10.48550/arXiv.2606.11173

Focus to learn more

arXiv-issued DOI via DataCite (pending registration)

Submission history From: Semih Kara [view email] [v1] Tue, 9 Jun 2026 17:50:09 UTC (10,633 KB)

Full-text links: Access Paper:

View a PDF of the paper titled The Role of Feedback Alignment in Self-Distillation, by Semih Kara and O\u{g}uzhan Ersoy View PDF HTML (experimental) TeX Source

view license

Current browse context:

cs.AI

next >

new recent | 2026-06

Change to browse by:

cs cs.LG

References & Citations

NASA ADS Google Scholar

Semantic Scholar

export BibTeX citation Loading...

BibTeX formatted citation

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Feedback Alignment in Self-Distillation

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

Apple WWDC 2026 Livestream

Claude Fable 5

It's Not Just X. It's Y

Show HN: GoPeek – open links in live mini browser windows without new tabs