On Generalization in Coreference Resolution

2021-09-20CRAC (ACL) 2021Code Available1· sign in to hype

Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, Kevin Gimpel

Code Available — Be the first to reproduce this paper.

Code

github.com/shtoshni92/fast-coref
OfficialIn paperpytorch★ 36
github.com/shtoshni/fast-coref
pytorch★ 36

Abstract

While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model on this heterogeneous data mixture by using data augmentation to account for annotation differences and sampling to balance the data quantities. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance, leading to better generalization in coreference resolution models. This work contributes a new benchmark for robust coreference resolution and multiple new state-of-the-art results.

Tasks

coreference-resolution Coreference Resolution Data Augmentation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
LitBank	longdoc S (OntoNotes + PreCo + LitBank)	F1	78.2	—	Unverified
OntoNotes	longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons)	F1	79.6	—	Unverified
OntoNotes	longdoc S (OntoNotes + PreCo + LitBank)	F1	79.2	—	Unverified
OntoNotes	longdoc S (OntoNotes + 60k pseudo-singletons)	F1	80.6	—	Unverified
PreCo	longdoc S (OntoNotes + PreCo + LitBank)	F1	87.6	—	Unverified
Quizbowl	longdoc S (OntoNotes + PreCo + LitBank)	F1	42.9	—	Unverified
WikiCoref	longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons)	F1	62.5	—	Unverified
WikiCoref	longdoc S (OntoNotes + PreCo + LitBank)	F1	60.3	—	Unverified
Winograd Schema Challenge	longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons)	Accuracy	59.4	—	Unverified
Winograd Schema Challenge	longdoc S (OntoNotes + PreCo + LitBank)	Accuracy	60.1	—	Unverified

On Generalization in Coreference Resolution

Code

Abstract

Tasks

Benchmark Results

Reproductions