SOTAVerified

Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Showing 901950 of 1760 papers

TitleStatusHype
Retrieval-guided Counterfactual Generation for QA0
Retrieval-guided Counterfactual Generation for QA0
Retrieve-and-Read: Multi-task Learning of Information Retrieval and Reading Comprehension0
Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering0
Revealing Weaknesses of Vietnamese Language Models Through Unanswerable Questions in Machine Reading Comprehension0
Revisiting the Open-Domain Question Answering Pipeline0
Robust Domain Adaptation for Machine Reading Comprehension0
Robustly Optimized and Distilled Training for Natural Language Understanding0
Robust Machine Comprehension Models via Adversarial Training0
Robust Machine Reading Comprehension by Learning Soft labels0
Robust Reading Comprehension with Linguistic Constraints via Posterior Regularization0
Robust Semantics for Semantic Parsing0
Roof-Transformer: Divided and Joined Understanding with Knowledge Enhancement0
Roof-BERT: Divide Understanding Labour and Join in Work0
Ruminating Reader: Reasoning with Gated Multi-Hop Attention0
Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models0
RVISA: Reasoning and Verification for Implicit Sentiment Analysis0
S2ST-Omni: An Efficient and Scalable Multilingual Speech-to-Speech Translation Framework via Seamless Speech-Text Alignment and Streaming Speech Generation0
Samajh-Boojh: A Reading Comprehension system in Hindi0
SANTO: A Web-based Annotation Tool for Ontology-driven Slot Filling0
SaulLM-7B: A pioneering Large Language Model for Law0
SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis0
Scalable Neural Theorem Proving on Knowledge Bases and Natural Language0
Scene Restoring for Narrative Machine Reading Comprehension0
ScholarlyRead: A New Dataset for Scientific Article Reading Comprehension0
Scientific Discovery as Link Prediction in Influence and Citation Graphs0
SciMRC: Multi-perspective Scientific Machine Reading Comprehension0
SCOP: Evaluating the Comprehension Process of Large Language Models from a Cognitive View0
Scoping natural language processing in Indonesian and Malay for education applications0
Seeing the World through Text: Evaluating Image Descriptions for Commonsense Reasoning in Machine Reading Comprehension0
Selecting Domain-Specific Concepts for Question Generation With Lightly-Supervised Methods0
Selective Self-to-Supervised Fine-Tuning for Generalization in Large Language Models0
Self-Attentive Constituency Parsing for UCCA-based Semantic Parsing0
Self-Supervised Test-Time Learning for Reading Comprehension0
Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data0
Semantic Features Based on Word Alignments for Estimating Quality of Text Simplification0
Semantic Framework for Comparison Structures in Natural Language0
Semantics-Aware Inferential Network for Natural Language Understanding0
Semantics-Preserved Distortion for Personal Privacy Protection in Information Management0
SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity0
SemEval-2018 Task 11: Machine Comprehension Using Commonsense Knowledge0
Semi-automatic Generation of Multiple-Choice Tests from Mentions of Semantic Relations0
Semi-Supervised Clustering for Short Answer Scoring0
Semi-supervised Training Data Generation for Multilingual Question Answering0
Sense-Specific Lexical Information for Reading Assistance0
Sentence Complexity Estimation for Chinese-speaking Learners of Japanese0
Sentence Extraction-Based Machine Reading Comprehension for Vietnamese0
Separating Answers from Queries for Neural Reading Comprehension0
Sequence Model with Self-Adaptive Sliding Window for Efficient Spoken Document Segmentation0
Sequential Attention: A Context-Aware Alignment Function for Machine Reading0
Show:102550
← PrevPage 19 of 36Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Rational Reasoner / IDOLTest80.6Unverified
2AMR-LE-EnsembleTest80Unverified
3MERIt(MERIt-deberta-v2-xxlarge )Test79.3Unverified
4MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42Test79.3Unverified
5Knowledge modelTest79.2Unverified
6DeBERTa-v2-xxlarge-AMR-LE-ContrapositionTest77.2Unverified
7LReasoner ensembleTest76.1Unverified
8ELECTRA and ALBERTTest71Unverified
9WWZTest69.7Unverified
10xlnet-large-uncased [extended data]Test69.3Unverified
#ModelMetricClaimedVerifiedStatus
1ALBERT (Ensemble)Accuracy91.4Unverified
2Megatron-BERT (ensemble)Accuracy90.9Unverified
3ALBERTxxlarge+DUMA(ensemble)Accuracy89.8Unverified
4Megatron-BERTAccuracy89.5Unverified
5XLNetAccuracy (Middle)88.6Unverified
6DeBERTalargeAccuracy86.8Unverified
7B10-10-10Accuracy85.7Unverified
8RoBERTaAccuracy83.2Unverified
9Orca 2-13BAccuracy82.87Unverified
10Orca 2-7BAccuracy80.79Unverified
#ModelMetricClaimedVerifiedStatus
1Golden TransformerAverage F10.94Unverified
2MT5 LargeAverage F10.84Unverified
3ruRoberta-large finetuneAverage F10.83Unverified
4ruT5-large-finetuneAverage F10.82Unverified
5Human BenchmarkAverage F10.81Unverified
6ruT5-base-finetuneAverage F10.77Unverified
7ruBert-large finetuneAverage F10.76Unverified
8ruBert-base finetuneAverage F10.74Unverified
9RuGPT3XL few-shotAverage F10.74Unverified
10RuGPT3LargeAverage F10.73Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa-LargeOverall: F164.4Unverified
2BERT-LargeOverall: F162.7Unverified
3BiDAFOverall: F128.5Unverified
#ModelMetricClaimedVerifiedStatus
1BERTMSE0.05Unverified
#ModelMetricClaimedVerifiedStatus
1BERT pretrained on MIMIC-IIIAnswer F163.55Unverified