Interpretable Coreference Resolution Evaluation Using Explicit Semantics

petethomas1 pts0 comments

Interpretable Coreference Resolution Evaluation Using Explicit Semantics - ACL AnthologyInterpretable Coreference Resolution Evaluation Using Explicit Semantics<br>Bruno Gatti,<br>Giuliano Martinelli,<br>Roberto Navigli

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account.<br>Once you create that issue, the correction will be reviewed by a staff member.<br>⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app.<br>Important : The Anthology treat PDFs as authoritative. Please use this form only to correct data<br>that is out of line with the PDF. See our corrections<br>guidelines if you need to change the PDF.<br>Title<br>Adjust the title. Retain tags such as

Authors<br>Adjust author names and order to match the<br>PDF.<br>Add AuthorAbstract<br>Correct abstract if needed. Retain XML formatting tags such as . You may use ... for bold , ... for italic, and ... for URLs.

Verification against PDF<br>Ensure that the new title/authors match the snapshot below. (If there<br>is no snapshot or it is too small, consult the PDF.)<br>Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including<br>middle initials, hyphens, and accents.<br>Create GitHub issue for staff review

Abstract<br>Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable improvements. We address this gap by introducing a semantically-enhanced evaluation framework for coreference resolution. Our approach overlays Concept and Named Entity Recognition (CNER) onto coreference outputs, assigning semantic labels to nominal mentions and propagating them to entire coreference clusters. This enables the computation of typed scores aimed at evaluating mention extraction and linking capabilities stratified by semantic class. Across our experiments on OntoNotes, LitBank, and PreCo, we show that our framework uncovers systematic weaknesses that remain obscured by aggregate metrics. Furthermore, we show that these diagnostics can be used to design targeted, low-cost data augmentation strategies, achieving measurable out-of-domain improvements.

Anthology ID:2026.acl-long.2126Volume:Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)Month:JulyYear:2026Address:San Diego, California, United StatesEditors:Maria Liakata,<br>Viviane P. Moreira,<br>Jiajun Zhang,<br>David JurgensVenue:ACLSIG:Publisher:Association for Computational LinguisticsNote:Pages:45854–45872Language:URL:https://aclanthology.org/2026.acl-long.2126/DOI:10.18653/v1/2026.acl-long.2126Bibkey:gatti-etal-2026-interpretableCite (ACL):Bruno Gatti, Giuliano Martinelli, and Roberto Navigli. 2026. Interpretable Coreference Resolution Evaluation Using Explicit Semantics. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45854–45872, San Diego, California, United States. Association for Computational Linguistics.Cite (Informal):Interpretable Coreference Resolution Evaluation Using Explicit Semantics (Gatti et al., ACL 2026)Copy Citation:BibTeX<br>Markdown<br>MODS XML<br>Endnote<br>More<br>options…PDF:https://aclanthology.org/2026.acl-long.2126.pdfChecklist:<br>2026.acl-long.2126.checklist.pdf<br>PDF<br>Cite<br>Search

Checklist

Fix data

Export citation

BibTeX<br>MODS XML<br>Endnote<br>Preformatted<br>@inproceedings{gatti-etal-2026-interpretable,<br>title = "Interpretable Coreference Resolution Evaluation Using Explicit Semantics",<br>author = "Gatti, Bruno and<br>Martinelli, Giuliano and<br>Navigli, Roberto",<br>editor = "Liakata, Maria and<br>Moreira, Viviane P. and<br>Zhang, Jiajun and<br>Jurgens, David",<br>booktitle = "Proceedings of the 64th Annual Meeting of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)",<br>month = jul,<br>year = "2026",<br>address = "San Diego, California, United States",<br>publisher = "Association for Computational Linguistics",<br>url = "https://aclanthology.org/2026.acl-long.2126/",<br>doi = "10.18653/v1/2026.acl-long.2126",<br>pages = "45854--45872",<br>ISBN = "979-8-89176-390-6",<br>abstract = "Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable improvements. We address this gap by introducing a...

coreference long resolution using evaluation interpretable

Related Articles