Reading Comprehension
Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.
Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.
Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.
The Machine Reading group at UCL also provides an overview of reading comprehension tasks.
Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets
Papers
Showing 1–10 of 1760 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Rational Reasoner / IDOL | Test | 80.6 | — | Unverified |
| 2 | AMR-LE-Ensemble | Test | 80 | — | Unverified |
| 3 | MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42 | Test | 79.3 | — | Unverified |
| 4 | MERIt(MERIt-deberta-v2-xxlarge ) | Test | 79.3 | — | Unverified |
| 5 | Knowledge model | Test | 79.2 | — | Unverified |
| 6 | DeBERTa-v2-xxlarge-AMR-LE-Contraposition | Test | 77.2 | — | Unverified |
| 7 | LReasoner ensemble | Test | 76.1 | — | Unverified |
| 8 | ELECTRA and ALBERT | Test | 71 | — | Unverified |
| 9 | WWZ | Test | 69.7 | — | Unverified |
| 10 | xlnet-large-uncased [extended data] | Test | 69.3 | — | Unverified |