Reading Comprehension
Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.
Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.
Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.
The Machine Reading group at UCL also provides an overview of reading comprehension tasks.
Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets
Papers
Showing 1–10 of 1760 papers
Benchmark Results
| # | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| 1 | Golden Transformer | Average F1 | 0.94 | — | Unverified |
| 2 | MT5 Large | Average F1 | 0.84 | — | Unverified |
| 3 | ruRoberta-large finetune | Average F1 | 0.83 | — | Unverified |
| 4 | ruT5-large-finetune | Average F1 | 0.82 | — | Unverified |
| 5 | Human Benchmark | Average F1 | 0.81 | — | Unverified |
| 6 | ruT5-base-finetune | Average F1 | 0.77 | — | Unverified |
| 7 | ruBert-large finetune | Average F1 | 0.76 | — | Unverified |
| 8 | ruBert-base finetune | Average F1 | 0.74 | — | Unverified |
| 9 | RuGPT3XL few-shot | Average F1 | 0.74 | — | Unverified |
| 10 | RuGPT3Large | Average F1 | 0.73 | — | Unverified |