SOTAVerified

Qualitative and Quantitative Analysis of Diversity in Cross-document Coreference Resolution Datasets

2021-10-16ACL ARR October 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Established cross-document coreference resolution (CDCR) datasets contain manually annotated event-centric mentions of events and entities that form coreference chains with identity relations. In this paper, we qualitatively and quantitatively compare the annotation schemes of ECB+, a CDCR dataset with identity coreference relations, and NewsWCL50, a CDCR dataset with identity, bridging, and near-identity coreference relations. The analysis shows that coreference chains of NewsWCL50 are more lexically diverse ECB+ but annotating of NewsWCL50 leads to the lower inter-coder reliability. We propose a phrasing diversity metric (PD) that encounters for the diversity of full phrases unlike the previously proposed metrics. We discuss the different tasks that both CDCR datasets create, i.e., lexical disambiguation and lexical diversity challenges for CDCR models, and propose a direction for further CDCR evaluation.

Tasks

Reproductions