Reading Comprehension
Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.
Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.
Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.
The Machine Reading group at UCL also provides an overview of reading comprehension tasks.
Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets
Papers
Showing 1–10 of 1760 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | ALBERT (Ensemble) | Accuracy | 91.4 | — | Unverified |
| 2 | Megatron-BERT (ensemble) | Accuracy | 90.9 | — | Unverified |
| 3 | ALBERTxxlarge+DUMA(ensemble) | Accuracy | 89.8 | — | Unverified |
| 4 | Megatron-BERT | Accuracy | 89.5 | — | Unverified |
| 5 | XLNet | Accuracy (Middle) | 88.6 | — | Unverified |
| 6 | DeBERTalarge | Accuracy | 86.8 | — | Unverified |
| 7 | B10-10-10 | Accuracy | 85.7 | — | Unverified |
| 8 | RoBERTa | Accuracy | 83.2 | — | Unverified |
| 9 | Orca 2-13B | Accuracy | 82.87 | — | Unverified |
| 10 | Orca 2-7B | Accuracy | 80.79 | — | Unverified |