Qualitative and Quantitative Analysis of Diversity in Cross-document Coreference Resolution Datasets

2021-10-16ACL ARR October 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

Established cross-document coreference resolution (CDCR) datasets contain manually annotated event-centric mentions of events and entities that form coreference chains with identity relations. In this paper, we qualitatively and quantitatively compare the annotation schemes of ECB+, a CDCR dataset with identity coreference relations, and NewsWCL50, a CDCR dataset with identity, bridging, and near-identity coreference relations. The analysis shows that coreference chains of NewsWCL50 are more lexically diverse ECB+ but annotating of NewsWCL50 leads to the lower inter-coder reliability. We propose a phrasing diversity metric (PD) that encounters for the diversity of full phrases unlike the previously proposed metrics. We discuss the different tasks that both CDCR datasets create, i.e., lexical disambiguation and lexical diversity challenges for CDCR models, and propose a direction for further CDCR evaluation.

Tasks

coreference-resolution Coreference Resolution Cross Document Coreference Resolution Diversity

Qualitative and Quantitative Analysis of Diversity in Cross-document Coreference Resolution Datasets

Abstract

Tasks

Reproductions