SOTAVerified

Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Showing 226250 of 1760 papers

TitleStatusHype
EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of Dyslexia0
Towards Robust Text Retrieval with Progressive LearningCode0
Orca 2: Teaching Small Language Models How to Reason0
Complementary Advantages of ChatGPTs and Human Readers in Reasoning: Evidence from English Text Reading Comprehension0
Token-Level Adaptation of LoRA Adapters for Downstream Task GeneralizationCode1
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User PerceptionCode0
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals0
Debate Helps Supervise Unreliable ExpertsCode1
Thread of Thought Unraveling Chaotic Contexts0
Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension0
BizBench: A Quantitative Reasoning Benchmark for Business and Finance0
Mirror: A Universal Framework for Various Information Extraction TasksCode1
Assessing Distractors in Multiple-Choice Tests0
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case StudyCode0
CreoleVal: Multilingual Multitask Benchmarks for CreolesCode1
MPrompt: Exploring Multi-level Prompt Tuning for Machine Reading ComprehensionCode1
Multi-grained Evidence Inference for Multi-choice Reading Comprehension0
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset0
Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut TriggersCode0
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine ReadingCode1
Evaluating Large Language Models on Controlled Generation TasksCode0
Explaining Interactions Between Text SpansCode0
Explicit Alignment and Many-to-many Entailment Based Reasoning for Conversational Machine Reading0
Do Language Models Learn about Legal Entity Types during Pretraining?Code0
Instructive Dialogue Summarization with Query AggregationsCode0
Show:102550
← PrevPage 10 of 71Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Rational Reasoner / IDOLTest80.6Unverified
2AMR-LE-EnsembleTest80Unverified
3MERIt(MERIt-deberta-v2-xxlarge )Test79.3Unverified
4MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42Test79.3Unverified
5Knowledge modelTest79.2Unverified
6DeBERTa-v2-xxlarge-AMR-LE-ContrapositionTest77.2Unverified
7LReasoner ensembleTest76.1Unverified
8ELECTRA and ALBERTTest71Unverified
9WWZTest69.7Unverified
10xlnet-large-uncased [extended data]Test69.3Unverified
#ModelMetricClaimedVerifiedStatus
1ALBERT (Ensemble)Accuracy91.4Unverified
2Megatron-BERT (ensemble)Accuracy90.9Unverified
3ALBERTxxlarge+DUMA(ensemble)Accuracy89.8Unverified
4Megatron-BERTAccuracy89.5Unverified
5XLNetAccuracy (Middle)88.6Unverified
6DeBERTalargeAccuracy86.8Unverified
7B10-10-10Accuracy85.7Unverified
8RoBERTaAccuracy83.2Unverified
9Orca 2-13BAccuracy82.87Unverified
10Orca 2-7BAccuracy80.79Unverified
#ModelMetricClaimedVerifiedStatus
1Golden TransformerAverage F10.94Unverified
2MT5 LargeAverage F10.84Unverified
3ruRoberta-large finetuneAverage F10.83Unverified
4ruT5-large-finetuneAverage F10.82Unverified
5Human BenchmarkAverage F10.81Unverified
6ruT5-base-finetuneAverage F10.77Unverified
7ruBert-large finetuneAverage F10.76Unverified
8ruBert-base finetuneAverage F10.74Unverified
9RuGPT3XL few-shotAverage F10.74Unverified
10RuGPT3LargeAverage F10.73Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa-LargeOverall: F164.4Unverified
2BERT-LargeOverall: F162.7Unverified
3BiDAFOverall: F128.5Unverified
#ModelMetricClaimedVerifiedStatus
1BERTMSE0.05Unverified
#ModelMetricClaimedVerifiedStatus
1BERT pretrained on MIMIC-IIIAnswer F163.55Unverified