SOTAVerified

Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Showing 16011650 of 1760 papers

TitleStatusHype
A Nil-Aware Answer Extraction Framework for Question AnsweringCode0
Knowledge Aware Conversation Generation with Explainable Reasoning over Augmented GraphsCode0
Two-Stage Synthesis Networks for Transfer Learning in Machine ComprehensionCode0
Video Relationship Detection Using Mixture of ExpertsCode0
X-WikiRE: A Large, Multilingual Resource for Relation Extraction as Machine ComprehensionCode0
CoQA: A Conversational Question Answering ChallengeCode0
Knowledge-Empowered Representation Learning for Chinese Medical Reading Comprehension: Task, Model and ResourcesCode0
End-to-End Open-Domain Question Answering with BERTseriniCode0
Knowledge-Guided Linguistic Rewrites for Inference Rule VerificationCode0
EMBRACE: Evaluation and Modifications for Boosting RACECode0
ElimiNet: A Model for Eliminating Options for Reading Comprehension with Multiple Choice QuestionsCode0
Beyond English-Only Reading Comprehension: Experiments in Zero-Shot Multilingual Transfer for BulgarianCode0
KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text UnderstandingCode0
SDNet: Contextualized Attention-based Deep Network for Conversational Question AnsweringCode0
OPERA: Operation-Pivoted Discrete Reasoning over TextCode0
Conversing by Reading: Contentful Neural Conversation with On-demand Machine ReadingCode0
SearchQA: A New Q&A Dataset Augmented with Context from a Search EngineCode0
SECTOR: A Neural Model for Coherent Topic Segmentation and ClassificationCode0
Text Understanding with the Attention Sum Reader NetworkCode0
Seemingly Plausible Distractors in Multi-Hop Reasoning: Are Large Language Models Attentive Readers?Code0
Eidos, INDRA, \& Delphi: From Free Text to Executable Causal ModelsCode0
Efficient Tuning of Large Language Models for Knowledge-Grounded Dialogue GenerationCode0
Large Language Models are Clinical Reasoners: Reasoning-Aware Diagnosis Framework with Prompt-Generated RationalesCode0
Efficient and Robust Question Answering from Minimal Context over DocumentsCode0
Conversational Machine Reading Comprehension for Vietnamese Healthcare TextsCode0
Effect of Visual Extensions on Natural Language Understanding in Vision-and-Language ModelsCode0
What if you said that differently?: How Explanation Formats Affect Human Feedback Efficacy and User PerceptionCode0
Select, Answer and Explain: Interpretable Multi-hop Reading Comprehension over Multiple DocumentsCode0
Large-scale Multi-granular Concept Extraction Based on Machine Reading ComprehensionCode0
Latent Alignment of Procedural Concepts in Multimodal RecipesCode0
Controlled and Balanced Dataset for Japanese Lexical SimplificationCode0
Effective Subword Segmentation for Text ComprehensionCode0
Educational Multi-Question Generation for Reading ComprehensionCode0
Are you tough enough? Framework for Robustness Validation of Machine Comprehension SystemsCode0
Contextualized Word Representations for Reading ComprehensionCode0
CoMuMDR: Code-mixed Multi-modal Multi-domain corpus for Discourse paRsing in conversationsCode0
Compositional Questions Do Not Necessitate Multi-hop ReasoningCode0
Zero-Shot Complex Question-Answering on Long Scientific DocumentsCode0
EchoPrompt: Instructing the Model to Rephrase Queries for Improved In-context LearningCode0
Dynamic Chunking and Selection for Reading Comprehension of Ultra-Long Context in Large Language ModelsCode0
DuReader_robust: A Chinese Dataset Towards Evaluating Robustness and Generalization of Machine Reading Comprehension in Real-World ApplicationsCode0
BERT Based Multilingual Machine Comprehension in English and HindiCode0
Learning Graph Representation of Agent DiffusersCode0
Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading ComprehensionCode0
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named EntitiesCode0
Self Question-answering: Aspect-based Sentiment Analysis by Role Flipped Machine Reading ComprehensionCode0
Learning Recurrent Span Representations for Extractive Question AnsweringCode0
SELF: Self-Extend the Context Length With Logistic Growth FunctionCode0
Learning Semantic Sentence Embeddings using Sequential Pair-wise DiscriminatorCode0
DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world ApplicationsCode0
Show:102550
← PrevPage 33 of 36Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Rational Reasoner / IDOLTest80.6Unverified
2AMR-LE-EnsembleTest80Unverified
3MERIt(MERIt-deberta-v2-xxlarge )Test79.3Unverified
4MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42Test79.3Unverified
5Knowledge modelTest79.2Unverified
6DeBERTa-v2-xxlarge-AMR-LE-ContrapositionTest77.2Unverified
7LReasoner ensembleTest76.1Unverified
8ELECTRA and ALBERTTest71Unverified
9WWZTest69.7Unverified
10xlnet-large-uncased [extended data]Test69.3Unverified
#ModelMetricClaimedVerifiedStatus
1ALBERT (Ensemble)Accuracy91.4Unverified
2Megatron-BERT (ensemble)Accuracy90.9Unverified
3ALBERTxxlarge+DUMA(ensemble)Accuracy89.8Unverified
4Megatron-BERTAccuracy89.5Unverified
5XLNetAccuracy (Middle)88.6Unverified
6DeBERTalargeAccuracy86.8Unverified
7B10-10-10Accuracy85.7Unverified
8RoBERTaAccuracy83.2Unverified
9Orca 2-13BAccuracy82.87Unverified
10Orca 2-7BAccuracy80.79Unverified
#ModelMetricClaimedVerifiedStatus
1Golden TransformerAverage F10.94Unverified
2MT5 LargeAverage F10.84Unverified
3ruRoberta-large finetuneAverage F10.83Unverified
4ruT5-large-finetuneAverage F10.82Unverified
5Human BenchmarkAverage F10.81Unverified
6ruT5-base-finetuneAverage F10.77Unverified
7ruBert-large finetuneAverage F10.76Unverified
8ruBert-base finetuneAverage F10.74Unverified
9RuGPT3XL few-shotAverage F10.74Unverified
10RuGPT3LargeAverage F10.73Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa-LargeOverall: F164.4Unverified
2BERT-LargeOverall: F162.7Unverified
3BiDAFOverall: F128.5Unverified
#ModelMetricClaimedVerifiedStatus
1BERTMSE0.05Unverified
#ModelMetricClaimedVerifiedStatus
1BERT pretrained on MIMIC-IIIAnswer F163.55Unverified