Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–10 of 1760 papers

Title	Date	Tasks	Status	Hype
DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy	Jul 2, 2025	Data AugmentationGeneralized Referring Expression Segmentation	CodeCode Available	1
Chaining Event Spans for Temporal Relation Grounding	Jun 17, 2025	Reading ComprehensionRelation	CodeCode Available	0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation	Jun 11, 2025	Reading ComprehensionSpeech Synthesis	—Unverified	0
CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversations	Jun 10, 2025	Discourse ParsingEmotion Recognition	CodeCode Available	0
Automatic Generation of Inference Making Questions for Reading Comprehension Assessments	Jun 9, 2025	DiagnosticReading Comprehension	CodeCode Available	0
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View	Jun 5, 2025	Reading Comprehension	—Unverified	0
Prosodic Structure Beyond Lexical Content: A Study of Self-Supervised Learning	Jun 3, 2025	Emotion RecognitionReading Comprehension	—Unverified	0
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language Models	Jun 1, 2025	ChunkingMulti-hop Question Answering	CodeCode Available	0
What Has Been Lost with Synthetic Evaluation?	May 28, 2025	NegationReading Comprehension	—Unverified	0
ReadBench: Measuring the Dense Text Visual Reading Ability of Vision-Language Models	May 25, 2025	Optical Character Recognition (OCR)Reading Comprehension	CodeCode Available	1

Show:10 25 50

← PrevPage 1 of 176Next →

All datasets ReClor RACE MuSeRC AdversarialQA CrowdSource QA RadQA

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	Golden Transformer	Average F1	0.94	—	Unverified
2	MT5 Large	Average F1	0.84	—	Unverified
3	ruRoberta-large finetune	Average F1	0.83	—	Unverified
4	ruT5-large-finetune	Average F1	0.82	—	Unverified
5	Human Benchmark	Average F1	0.81	—	Unverified
6	ruT5-base-finetune	Average F1	0.77	—	Unverified
7	ruBert-large finetune	Average F1	0.76	—	Unverified
8	ruBert-base finetune	Average F1	0.74	—	Unverified
9	RuGPT3XL few-shot	Average F1	0.74	—	Unverified
10	RuGPT3Large	Average F1	0.73	—	Unverified