Reinforcement Learning with Metacognitive Feedback

guard0g1 pts0 comments

[2606.32032] Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Skip to main content

arXiv is now an independent nonprofit!<br>Learn more<br>&times;

Search arXiv

Press Enter to search &middot; Advanced search

-->

Computer Science > Computation and Language

arXiv:2606.32032 (cs)

[Submitted on 30 Jun 2026]

Title:Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

Authors:Gabrielle Kaili-May Liu, Avi Caciularu, Gal Yona, Idan Szpektor, Arman Cohan<br>View a PDF of the paper titled Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs, by Gabrielle Kaili-May Liu and 4 other authors

View PDF<br>HTML (experimental)

Abstract:Metacognition is a critical component of intelligence that describes the ability to monitor and regulate one's own cognitive processes. Yet LLMs exhibit systemic deficiencies in key metacognitive faculties: they hallucinate with high confidence, fail to recognize knowledge boundaries, and misrepresent their internal uncertainty--undermining trustworthiness and reliability. Since monitoring task performance and adapting behavior accordingly are central to metacognition, we posit that models capable of accurately judging their own performance are better positioned to improve it. We operationalize this idea via two novel mechanisms: reinforcement learning with metacognitive feedback (RLMF), a paradigm to refine completion rankings during preference optimization based on the quality of a model's self-judgments of performance, and metacognitive data selection, which uses similar self-judgments to identify high-value training examples, outperforming naive active learning. We apply these innovations to the problem of faithful calibration (FC), a task that is itself fundamentally metacognitive: the goal is to align expressed with intrinsic uncertainty, difficult even for frontier LLMs. We adopt a two-stage, decoupled approach, first using these methods to calibrate the faithfulness of models' self-reported confidence scores, then mapping to natural, context-adaptable linguistic uncertainty via targeted output editing. Extensive experiments show RLMF achieves generalizable, state-of-the-art FC on diverse tasks while preserving accuracy. Further, RLMF surpasses standard RL by up to 63% while enhancing models' ability to assess and express their own capability limits. This positions RLMF as a promising paradigm to enhance LLM metacognition toward improved abilities and alignment, and suggests metacognitive performance as an effective RL signal to overcome limits of prior intrinsic feedback methods.

Comments:<br>Code: this https URL

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)

Cite as:<br>arXiv:2606.32032 [cs.CL]

(or<br>arXiv:2606.32032v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2606.32032

Focus to learn more

arXiv-issued DOI via DataCite (pending registration)

Submission history<br>From: Gabrielle Liu [view email]<br>[v1]<br>Tue, 30 Jun 2026 17:56:01 UTC (3,482 KB)

Full-text links:<br>Access Paper:

View a PDF of the paper titled Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs, by Gabrielle Kaili-May Liu and 4 other authors<br>View PDF<br>HTML (experimental)<br>TeX Source

view license

Current browse context:

cs.CL

next >

new<br>recent<br>| 2026-06

Change to browse by:

cs<br>cs.AI

References & Citations

NASA ADS<br>Google Scholar

Semantic Scholar

export BibTeX citation<br>Loading...

BibTeX formatted citation

&times;

loading...

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author

Venue

Institution

Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have...

toggle metacognitive arxiv learning feedback uncertainty

Related Articles