SOTAVerified

Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Showing 201250 of 1760 papers

TitleStatusHype
Large Language Models are Superpositions of All Characters: Attaining Arbitrary Role-play via Self-AlignmentCode3
Persona-centric Metamorphic Relation guided Robustness Evaluation for Multi-turn Dialogue Modelling0
Majority or Minority: Data Imbalance Learning Method for Named Entity Recognition0
Knowledge Fusion of Large Language ModelsCode4
Power in Numbers: Robust reading comprehension by finetuning with four adversarial sentences per example0
Contrastive Perplexity for Controlled Generation: An Application in Detoxifying Large Language Models0
Large Language Models are Null-Shot Learners0
The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey0
Towards Efficient Methods in Medical Question Answering using Knowledge Graph EmbeddingsCode0
Developing ChatGPT for Biology and Medicine: A Complete Review of Biomedical Question Answering0
Improving Domain Adaptation through Extended-Text Reading ComprehensionCode0
Structsum Generation for Faster Text Comprehension0
Attendre: Wait To Attend By Retrieval With Evicted Queries in Memory-Based Transformers for Long Context Processing0
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training0
LatestEval: Addressing Data Contamination in Language Model Evaluation through Dynamic and Time-Sensitive Test ConstructionCode1
Do Text Simplification Systems Preserve Meaning? A Human Evaluation via Reading ComprehensionCode0
Probing Pretrained Language Models with Hierarchy Properties0
High-throughput Biomedical Relation Extraction for Semi-Structured Web Articles Empowered by Large Language Models0
Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated RationalesCode0
Generative Large Language Models Are All-purpose Text Analytics Engines: Text-to-text Learning Is All Your Need0
GYM at Qur’an QA 2023 Shared Task: Multi-Task Transfer Learning for Quranic Passage Retrieval and Question Answering with Large Language ModelsCode0
Think from Words(TFW): Initiating Human-Like Cognition in Large Language Models Through Think from Words for Japanese Text-level Classification0
GPT4Point: A Unified Framework for Point-Language Understanding and GenerationCode2
Let the LLMs Talk: Simulating Human-to-Human Conversational QA via Zero-Shot LLM-to-LLM InteractionsCode1
Evaluating the Rationale Understanding of Critical Reasoning in Logical Reading Comprehension0
EEG Connectivity Analysis Using Denoising Autoencoders for the Detection of DyslexiaCode0
Towards Robust Text Retrieval with Progressive LearningCode0
Orca 2: Teaching Small Language Models How to Reason0
Complementary Advantages of ChatGPTs and Human Readers in Reasoning: Evidence from English Text Reading Comprehension0
Token-Level Adaptation of LoRA Adapters for Downstream Task GeneralizationCode1
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User PerceptionCode0
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals0
Debate Helps Supervise Unreliable ExpertsCode1
Thread of Thought Unraveling Chaotic Contexts0
Sharing, Teaching and Aligning: Knowledgeable Transfer Learning for Cross-Lingual Machine Reading Comprehension0
BizBench: A Quantitative Reasoning Benchmark for Business and Finance0
Mirror: A Universal Framework for Various Information Extraction TasksCode1
Assessing Distractors in Multiple-Choice Tests0
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case StudyCode0
CreoleVal: Multilingual Multitask Benchmarks for CreolesCode1
MPrompt: Exploring Multi-level Prompt Tuning for Machine Reading ComprehensionCode1
Multi-grained Evidence Inference for Multi-choice Reading Comprehension0
Can LLMs Grade Short-Answer Reading Comprehension Questions : An Empirical Study with a Novel Dataset0
Guiding LLM to Fool Itself: Automatically Manipulating Machine Reading Comprehension Shortcut TriggersCode0
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine ReadingCode1
Evaluating Large Language Models on Controlled Generation TasksCode0
Explaining Interactions Between Text SpansCode0
Explicit Alignment and Many-to-many Entailment Based Reasoning for Conversational Machine Reading0
Do Language Models Learn about Legal Entity Types during Pretraining?Code0
Instructive Dialogue Summarization with Query AggregationsCode0
Show:102550
← PrevPage 5 of 36Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Rational Reasoner / IDOLTest80.6Unverified
2AMR-LE-EnsembleTest80Unverified
3MERIt(MERIt-deberta-v2-xxlarge )Test79.3Unverified
4MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42Test79.3Unverified
5Knowledge modelTest79.2Unverified
6DeBERTa-v2-xxlarge-AMR-LE-ContrapositionTest77.2Unverified
7LReasoner ensembleTest76.1Unverified
8ELECTRA and ALBERTTest71Unverified
9WWZTest69.7Unverified
10xlnet-large-uncased [extended data]Test69.3Unverified
#ModelMetricClaimedVerifiedStatus
1ALBERT (Ensemble)Accuracy91.4Unverified
2Megatron-BERT (ensemble)Accuracy90.9Unverified
3ALBERTxxlarge+DUMA(ensemble)Accuracy89.8Unverified
4Megatron-BERTAccuracy89.5Unverified
5XLNetAccuracy (Middle)88.6Unverified
6DeBERTalargeAccuracy86.8Unverified
7B10-10-10Accuracy85.7Unverified
8RoBERTaAccuracy83.2Unverified
9Orca 2-13BAccuracy82.87Unverified
10Orca 2-7BAccuracy80.79Unverified
#ModelMetricClaimedVerifiedStatus
1Golden TransformerAverage F10.94Unverified
2MT5 LargeAverage F10.84Unverified
3ruRoberta-large finetuneAverage F10.83Unverified
4ruT5-large-finetuneAverage F10.82Unverified
5Human BenchmarkAverage F10.81Unverified
6ruT5-base-finetuneAverage F10.77Unverified
7ruBert-large finetuneAverage F10.76Unverified
8ruBert-base finetuneAverage F10.74Unverified
9RuGPT3XL few-shotAverage F10.74Unverified
10RuGPT3LargeAverage F10.73Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa-LargeOverall: F164.4Unverified
2BERT-LargeOverall: F162.7Unverified
3BiDAFOverall: F128.5Unverified
#ModelMetricClaimedVerifiedStatus
1BERTMSE0.05Unverified
#ModelMetricClaimedVerifiedStatus
1BERT pretrained on MIMIC-IIIAnswer F163.55Unverified