SOTAVerified

Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Showing 401450 of 1760 papers

TitleStatusHype
MRC-based Nested Medical NER with Co-prediction and Adaptive Pre-training0
Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents0
Knowledge Condensation and Reasoning for Knowledge-based VQA0
CuentosIE: can a chatbot about "tales with a message" help to teach emotional intelligence?0
Towards a Psychology of Machines: Large Language Models Predict Human Memory0
Video Relationship Detection Using Mixture of ExpertsCode0
SaulLM-7B: A pioneering Large Language Model for Law0
AceMap: Knowledge Discovery through Academic Graph0
Predicting Learning Performance with Large Language Models: A Study in Adult Literacy0
Choose Your Own Adventure: Interactive E-Books to Improve Word Knowledge and Comprehension Skills0
Do Large Language Models Mirror Cognitive Language Processing?0
Treatment effects without multicollinearity? Temporal order and the Gram-Schmidt process in causal inferenceCode0
QASE Enhanced PLMs: Improved Control in Text Generation for MRC0
CliqueParcel: An Approach For Batching LLM Prompts That Jointly Optimizes Efficiency And Faithfulness0
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts0
Are Machines Better at Complex Reasoning? Unveiling Human-Machine Inference Gaps in Entailment Verification0
VlogQA: Task, Dataset, and Baseline Models for Vietnamese Spoken-Based Machine Reading ComprehensionCode0
MULTI: Multimodal Understanding Leaderboard with Text and Images0
SOCIALITE-LLAMA: An Instruction-Tuned Model for Social Scientific Tasks0
An Information-Theoretic Approach to Analyze NLP Classification TasksCode0
Do Language Models Exhibit the Same Cognitive Biases in Problem Solving as Human Learners?Code0
Paramanu: A Family of Novel Efficient Generative Foundation Language Models for Indian Languages0
Evaluating Gender Bias in Large Language Models via Chain-of-Thought Prompting0
Persona-centric Metamorphic Relation guided Robustness Evaluation for Multi-turn Dialogue Modelling0
Majority or Minority: Data Imbalance Learning Method for Named Entity Recognition0
Power in Numbers: Robust reading comprehension by finetuning with four adversarial sentences per example0
Large Language Models are Null-Shot Learners0
Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models0
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey0
Improving Domain Adaptation through Extended-Text Reading Comprehension0
Structsum Generation for Faster Text Comprehension0
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing0
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training0
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading ComprehensionCode0
Probing Pretrained Language Models with Hierarchy Properties0
High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models0
Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated RationalesCode0
Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need0
GYM at Qur’an QA 2023 Shared Task: Multi-Task Transfer Learning for Quranic Passage Retrieval and Question Answering with Large Language ModelsCode0
Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification0
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension0
EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of Dyslexia0
Towards Robust Text Retrieval with Progressive LearningCode0
Orca 2: Teaching Small Language Models How to Reason0
Complementary Advantages of ChatGPTs and Human Readers in Reasoning: Evidence from English Text Reading Comprehension0
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals0
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User PerceptionCode0
Thread of Thought Unraveling Chaotic Contexts0
Show:102550
← PrevPage 9 of 36Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Rational Reasoner / IDOLTest80.6Unverified
2AMR-LE-EnsembleTest80Unverified
3MERIt(MERIt-deberta-v2-xxlarge )Test79.3Unverified
4MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42Test79.3Unverified
5Knowledge modelTest79.2Unverified
6DeBERTa-v2-xxlarge-AMR-LE-ContrapositionTest77.2Unverified
7LReasoner ensembleTest76.1Unverified
8ELECTRA and ALBERTTest71Unverified
9WWZTest69.7Unverified
10xlnet-large-uncased [extended data]Test69.3Unverified
#ModelMetricClaimedVerifiedStatus
1ALBERT (Ensemble)Accuracy91.4Unverified
2Megatron-BERT (ensemble)Accuracy90.9Unverified
3ALBERTxxlarge+DUMA(ensemble)Accuracy89.8Unverified
4Megatron-BERTAccuracy89.5Unverified
5XLNetAccuracy (Middle)88.6Unverified
6DeBERTalargeAccuracy86.8Unverified
7B10-10-10Accuracy85.7Unverified
8RoBERTaAccuracy83.2Unverified
9Orca 2-13BAccuracy82.87Unverified
10Orca 2-7BAccuracy80.79Unverified
#ModelMetricClaimedVerifiedStatus
1Golden TransformerAverage F10.94Unverified
2MT5 LargeAverage F10.84Unverified
3ruRoberta-large finetuneAverage F10.83Unverified
4ruT5-large-finetuneAverage F10.82Unverified
5Human BenchmarkAverage F10.81Unverified
6ruT5-base-finetuneAverage F10.77Unverified
7ruBert-large finetuneAverage F10.76Unverified
8ruBert-base finetuneAverage F10.74Unverified
9RuGPT3XL few-shotAverage F10.74Unverified
10RuGPT3LargeAverage F10.73Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa-LargeOverall: F164.4Unverified
2BERT-LargeOverall: F162.7Unverified
3BiDAFOverall: F128.5Unverified
#ModelMetricClaimedVerifiedStatus
1BERTMSE0.05Unverified
#ModelMetricClaimedVerifiedStatus
1BERT pretrained on MIMIC-IIIAnswer F163.55Unverified