Interpretable Coreference Resolution Evaluation Using Explicit Semantics

Interpretable Coreference Resolution Evaluation Using Explicit Semantics - ACL AnthologyInterpretable Coreference Resolution Evaluation Using Explicit Semantics Bruno Gatti, Giuliano Martinelli, Roberto Navigli

Correct Metadata for

Use this form to create a GitHub issue with structured data describing the correction. You will need a GitHub account. Once you create that issue, the correction will be reviewed by a staff member. ⚠️ Mobile Users: Submitting this form to create a new issue will only work with github.com, not the GitHub Mobile app. Important : The Anthology treat PDFs as authoritative. Please use this form only to correct data that is out of line with the PDF. See our corrections guidelines if you need to change the PDF. Title Adjust the title. Retain tags such as

Authors Adjust author names and order to match the PDF. Add AuthorAbstract Correct abstract if needed. Retain XML formatting tags such as . You may use ... for bold , ... for italic, and ... for URLs.

Verification against PDF Ensure that the new title/authors match the snapshot below. (If there is no snapshot or it is too small, consult the PDF.) Authors concatenated from the text boxes above:

ALL author names match the snapshot above—including middle initials, hyphens, and accents. Create GitHub issue for staff review

Abstract Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable improvements. We address this gap by introducing a semantically-enhanced evaluation framework for coreference resolution. Our approach overlays Concept and Named Entity Recognition (CNER) onto coreference outputs, assigning semantic labels to nominal mentions and propagating them to entire coreference clusters. This enables the computation of typed scores aimed at evaluating mention extraction and linking capabilities stratified by semantic class. Across our experiments on OntoNotes, LitBank, and PreCo, we show that our framework uncovers systematic weaknesses that remain obscured by aggregate metrics. Furthermore, we show that these diagnostics can be used to design targeted, low-cost data augmentation strategies, achieving measurable out-of-domain improvements.

Anthology ID:2026.acl-long.2126Volume:Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)Month:JulyYear:2026Address:San Diego, California, United StatesEditors:Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David JurgensVenue:ACLSIG:Publisher:Association for Computational LinguisticsNote:Pages:45854–45872Language:URL:https://aclanthology.org/2026.acl-long.2126/DOI:10.18653/v1/2026.acl-long.2126Bibkey:gatti-etal-2026-interpretableCite (ACL):Bruno Gatti, Giuliano Martinelli, and Roberto Navigli. 2026. Interpretable Coreference Resolution Evaluation Using Explicit Semantics. In Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 45854–45872, San Diego, California, United States. Association for Computational Linguistics.Cite (Informal):Interpretable Coreference Resolution Evaluation Using Explicit Semantics (Gatti et al., ACL 2026)Copy Citation:BibTeX Markdown MODS XML Endnote More options…PDF:https://aclanthology.org/2026.acl-long.2126.pdfChecklist: 2026.acl-long.2126.checklist.pdf PDF Cite Search

Checklist

Fix data

Export citation

BibTeX MODS XML Endnote Preformatted @inproceedings{gatti-etal-2026-interpretable, title = "Interpretable Coreference Resolution Evaluation Using Explicit Semantics", author = "Gatti, Bruno and Martinelli, Giuliano and Navigli, Roberto", editor = "Liakata, Maria and Moreira, Viviane P. and Zhang, Jiajun and Jurgens, David", booktitle = "Proceedings of the 64th Annual Meeting of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)", month = jul, year = "2026", address = "San Diego, California, United States", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2026.acl-long.2126/", doi = "10.18653/v1/2026.acl-long.2126", pages = "45854--45872", ISBN = "979-8-89176-390-6", abstract = "Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable improvements. We address this gap by introducing a...

Interpretable Coreference Resolution Evaluation Using Explicit Semantics

Related Articles

(no title)

Is AI ruining our skills? Early results are in – and they're not good

The Anatomy of an AI-Native Org

ZCode – Harness for GLM-5.2

Apertus – Open Foundation Model for Sovereign AI