Coreference Resolution
Papers
No papers found.
All datasetsWinograd Schema ChallengeOntoNotesCoNLL-2012GAPDWIEWikiCorefCoNLL12LitBankOntoGUMPreCoSTM-corefXWinograd EN
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | PaLM 540B (fine-tuned) | Accuracy | 100 | — | Unverified |
| 2 | Vega v2 6B (KD-based prompt transfer) | Accuracy | 98.6 | — | Unverified |
| 3 | UL2 20B (fine-tuned) | Accuracy | 98.1 | — | Unverified |
| 4 | Turing NLR v5 XXL 5.4B (fine-tuned) | Accuracy | 97.3 | — | Unverified |
| 5 | ST-MoE-32B 269B (fine-tuned) | Accuracy | 96.6 | — | Unverified |
| 6 | DeBERTa-1.5B | Accuracy | 95.9 | — | Unverified |
| 7 | T5-XXL 11B (fine-tuned) | Accuracy | 93.8 | — | Unverified |
| 8 | ST-MoE-L 4.1B (fine-tuned) | Accuracy | 93.3 | — | Unverified |
| 9 | RoBERTa-WinoGrande 355M | Accuracy | 90.1 | — | Unverified |
| 10 | Flan-T5 XXL (zero -shot) | Accuracy | 89.82 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Maverick_mes | F1 | 83.6 | — | Unverified |
| 2 | seq2seq | F1 | 83.3 | — | Unverified |
| 3 | ASP+T0-3B | F1 | 82.3 | — | Unverified |
| 4 | caw-coref + RoBERTa | F1 | 81.6 | — | Unverified |
| 5 | LingMess | F1 | 81.4 | — | Unverified |
| 6 | wl-coref + RoBERTa | F1 | 81 | — | Unverified |
| 7 | U-MEM + Longformer | F1 | 80.9 | — | Unverified |
| 8 | longdoc S (OntoNotes + 60k pseudo-singletons) | F1 | 80.6 | — | Unverified |
| 9 | G2GT SpanBERT-large reduced | F1 | 80.5 | — | Unverified |
| 10 | G2GT SpanBERT-large overlap | F1 | 80.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Maverick_mes | Avg F1 | 83.6 | — | Unverified |
| 2 | seq2seq | Avg F1 | 83.3 | — | Unverified |
| 3 | CorefQA + SpanBERT-large | Avg F1 | 83.1 | — | Unverified |
| 4 | ASP+T0-3B | Avg F1 | 82.3 | — | Unverified |
| 5 | wl-coref + RoBERTa | Avg F1 | 81 | — | Unverified |
| 6 | s2e + Longformer-Large | Avg F1 | 80.3 | — | Unverified |
| 7 | SpanBERT + Cluster Merging | Avg F1 | 80.2 | — | Unverified |
| 8 | c2f + SpanBERT-Large | Avg F1 | 80.2 | — | Unverified |
| 9 | CorefQA + SpanBERT-base | Avg F1 | 79.9 | — | Unverified |
| 10 | U-MEM* + SpanBERT-large | Avg F1 | 79.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Coref-MTL | Overall F1 | 92.72 | — | Unverified |
| 2 | ProBERT | Overall F1 | 92.5 | — | Unverified |
| 3 | Maverick_incr | Overall F1 | 91.2 | — | Unverified |
| 4 | Full Ensemble | Overall F1 | 90.2 | — | Unverified |
| 5 | PeTra | F1 | 85.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Maverick_mes | F1 | 66.8 | — | Unverified |
| 2 | longdoc S (ON + PreCo + LitBank + 30k pseudo-singletons) | F1 | 62.5 | — | Unverified |
| 3 | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 60.3 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | DeepStruct multi-task w/ finetune | Average F1 | 73.1 | — | Unverified |
| 2 | DeepStruct multi-task | Average F1 | 60.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Maverick_incr | Avg F1 | 78.3 | — | Unverified |
| 2 | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 78.2 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Maverick_incr | F1 | 88 | — | Unverified |
| 2 | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 87.6 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | BFCR + SpanBERT + Transfer Learning | CoNLL F1 | 61.4 | — | Unverified |
| 2 | BFCR + SpanBERT | CoNLL F1 | 50.4 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | longdoc S (OntoNotes + PreCo + LitBank) | F1 | 42.9 | — | Unverified |
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | dali-full-anaphora | Avg F1 | 77.9 | — | Unverified |