Question Answering
Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.
( Image credit: SQuAD )
Papers
Showing 1–10 of 10817 papers
All datasetsSQuAD2.0SQuAD1.1HotpotQAPIQABoolQCOPATriviaQASQuAD1.1 devNatural QuestionsOpenBookQATruthfulQAMultiRC
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | IE-Net (ensemble) | EM | 90.94 | — | Unverified |
| 2 | FPNet (ensemble) | EM | 90.87 | — | Unverified |
| 3 | IE-NetV2 (ensemble) | EM | 90.86 | — | Unverified |
| 4 | SA-Net on Albert (ensemble) | EM | 90.72 | — | Unverified |
| 5 | SA-Net-V2 (ensemble) | EM | 90.68 | — | Unverified |
| 6 | FPNet (ensemble) | EM | 90.6 | — | Unverified |
| 7 | Retro-Reader (ensemble) | EM | 90.58 | — | Unverified |
| 8 | EntitySpanFocusV2 (ensemble) | EM | 90.52 | — | Unverified |
| 9 | TransNets + SFVerifier + SFEnsembler (ensemble) | EM | 90.49 | — | Unverified |
| 10 | EntitySpanFocus+AT (ensemble) | EM | 90.45 | — | Unverified |