On Generalization in Coreference Resolution
Shubham Toshniwal, Patrick Xia, Sam Wiseman, Karen Livescu, Kevin Gimpel
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/shtoshni92/fast-corefOfficialIn paperpytorch★ 36
- github.com/shtoshni/fast-corefpytorch★ 36
Abstract
While coreference resolution is defined independently of dataset domain, most models for performing coreference resolution do not transfer well to unseen domains. We consolidate a set of 8 coreference resolution datasets targeting different domains to evaluate the off-the-shelf performance of models. We then mix three datasets for training; even though their domain, annotation guidelines, and metadata differ, we propose a method for jointly training a single model on this heterogeneous data mixture by using data augmentation to account for annotation differences and sampling to balance the data quantities. We find that in a zero-shot setting, models trained on a single dataset transfer poorly while joint training yields improved overall performance, leading to better generalization in coreference resolution models. This work contributes a new benchmark for robust coreference resolution and multiple new state-of-the-art results.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| LitBank | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 78.2 | — | Unverified |
| OntoNotes | longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons) | F1 | 79.6 | — | Unverified |
| OntoNotes | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 79.2 | — | Unverified |
| OntoNotes | longdoc S (OntoNotes + 60k pseudo-singletons) | F1 | 80.6 | — | Unverified |
| PreCo | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 87.6 | — | Unverified |
| Quizbowl | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 42.9 | — | Unverified |
| WikiCoref | longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons) | F1 | 62.5 | — | Unverified |
| WikiCoref | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 60.3 | — | Unverified |
| Winograd Schema Challenge | longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons) | Accuracy | 59.4 | — | Unverified |
| Winograd Schema Challenge | longdoc S (OntoNotes + PreCo + LitBank) | Accuracy | 60.1 | — | Unverified |