SOTAVerified

Reading Comprehension

Most current question answering datasets frame the task as reading comprehension where the question is about a paragraph or document and the answer often is a span in the document.

Some specific tasks of reading comprehension include multi-modal machine reading comprehension and textual machine reading comprehension, among others. In the literature, machine reading comprehension can be divide into four categories: cloze style, multiple choice, span prediction, and free-form answer. Read more about each category here.

Benchmark datasets used for testing a model's reading comprehension abilities include MovieQA, ReCoRD, and RACE, among others.

The Machine Reading group at UCL also provides an overview of reading comprehension tasks.

Figure source: A Survey on Machine Reading Comprehension: Tasks, Evaluation Metrics and Benchmark Datasets

Papers

Showing 851900 of 1760 papers

TitleStatusHype
Team Solomon at SemEval-2020 Task 4: Be Reasonable: Exploiting Large-scale Language Models for Commonsense Reasoning0
FPAI at SemEval-2020 Task 10: A Query Enhanced Model with RoBERTa for Emphasis Selection0
Using Machine Learning and Natural Language Processing Techniques to Analyze and Support Moderation of Student Book Discussions0
IIRC: A Dataset of Incomplete Information Reading Comprehension Questions0
Unsupervised Explanation Generation for Machine Reading Comprehension0
CalibreNet: Calibration Networks for Multilingual Sequence Labeling0
Synonym Knowledge Enhanced Reader for Chinese Idiom Reading ComprehensionCode0
From Dataset Recycling to Multi-Property Extraction and BeyondCode0
Answer Span Correction in Machine Reading Comprehension0
Improving Machine Reading Comprehension with Single-choice Decision and Transfer Learning0
Context-Aware Answer Extraction in Question AnsweringCode1
Structured Prediction for Joint Class Cardinality and Entity Property Inference in Model-Complete Text Comprehension0
BiTeM at WNUT 2020 Shared Task-1: Named Entity Recognition over Wet Lab Protocols using an Ensemble of Contextual Language Models0
Correcting the Misuse: A Method for the Chinese Idiom Cloze Test0
How You Ask Matters: The Effect of Paraphrastic Questions to BERT Performance on a Clinical SQuAD Dataset0
Q. Can Knowledge Graphs be used to Answer Boolean Questions? A. It’s complicated!0
Event Extraction as Multi-turn Question Answering0
ISAAQ - Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention0
``You are grounded!'': Latent Name Artifacts in Pre-trained Language Models0
Towards Medical Machine Reading Comprehension with Structural Knowledge and Plain Text0
Scene Restoring for Narrative Machine Reading Comprehension0
Understanding Procedural Text using Interactive Entity Networks0
Event Extraction as Machine Reading Comprehension0
Logic-guided Semantic Representation Learning for Zero-Shot Relation Classification0
Leveraging Extracted Model Adversaries for Improved Black Box Attacks0
RussianSuperGLUE: A Russian Language Understanding Evaluation BenchmarkCode1
Cross-lingual Machine Reading Comprehension with Language Branch Knowledge Distillation0
QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications0
Commonsense knowledge adversarial dataset that challenges ELECTRA0
Improved Synthetic Training for Reading Comprehension0
Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension0
Challenges in Information-Seeking QA: Unanswerable Questions and Paragraph Retrieval0
mT5: A massively multilingual pre-trained text-to-text transformerCode1
Probing and Fine-tuning Reading Comprehension Models for Few-shot Event Extraction0
RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question AnsweringCode1
Knowledge Distillation for Improved Accuracy in Spoken Question Answering0
Bi-directional Cognitive Thinking Network for Machine Reading Comprehension0
Deriving Commonsense Inference Tasks from Interactive Fictions0
Technical Question Answering across Tasks and DomainsCode0
Towards Interpreting BERT for Reading Comprehension Based QACode0
A Wrong Answer or a Wrong Question? An Intricate Relationship between Question Reformulation and Answer Selection in Conversational Question AnsweringCode0
Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension0
Multi-Stage Pre-training for Low-Resource Domain Adaptation0
Open-Domain Question Answering Goes Conversational via Question RewritingCode1
Counterfactually-Augmented SNLI Training Data Does Not Yield Better Generalization Than Unaugmented DataCode0
MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension MetricsCode1
PolicyQA: A Reading Comprehension Dataset for Privacy PoliciesCode1
Context Modeling with Evidence Filter for Multiple Choice Question Answering0
Interactive Fiction Game Playing as Multi-Paragraph Reading Comprehension with Reinforcement LearningCode1
Discern: Discourse-Aware Entailment Reasoning Network for Conversational Machine ReadingCode1
Show:102550
← PrevPage 18 of 36Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Rational Reasoner / IDOLTest80.6Unverified
2AMR-LE-EnsembleTest80Unverified
3MERIt(MERIt-deberta-v2-xxlarge )Test79.3Unverified
4MERIt-deberta-v2-xxlarge deberta.v2.xxlarge.path.override_True.norm_1.1.0.w2.A100.cp200.s42Test79.3Unverified
5Knowledge modelTest79.2Unverified
6DeBERTa-v2-xxlarge-AMR-LE-ContrapositionTest77.2Unverified
7LReasoner ensembleTest76.1Unverified
8ELECTRA and ALBERTTest71Unverified
9WWZTest69.7Unverified
10xlnet-large-uncased [extended data]Test69.3Unverified
#ModelMetricClaimedVerifiedStatus
1ALBERT (Ensemble)Accuracy91.4Unverified
2Megatron-BERT (ensemble)Accuracy90.9Unverified
3ALBERTxxlarge+DUMA(ensemble)Accuracy89.8Unverified
4Megatron-BERTAccuracy89.5Unverified
5XLNetAccuracy (Middle)88.6Unverified
6DeBERTalargeAccuracy86.8Unverified
7B10-10-10Accuracy85.7Unverified
8RoBERTaAccuracy83.2Unverified
9Orca 2-13BAccuracy82.87Unverified
10Orca 2-7BAccuracy80.79Unverified
#ModelMetricClaimedVerifiedStatus
1Golden TransformerAverage F10.94Unverified
2MT5 LargeAverage F10.84Unverified
3ruRoberta-large finetuneAverage F10.83Unverified
4ruT5-large-finetuneAverage F10.82Unverified
5Human BenchmarkAverage F10.81Unverified
6ruT5-base-finetuneAverage F10.77Unverified
7ruBert-large finetuneAverage F10.76Unverified
8ruBert-base finetuneAverage F10.74Unverified
9RuGPT3XL few-shotAverage F10.74Unverified
10RuGPT3LargeAverage F10.73Unverified
#ModelMetricClaimedVerifiedStatus
1RoBERTa-LargeOverall: F164.4Unverified
2BERT-LargeOverall: F162.7Unverified
3BiDAFOverall: F128.5Unverified
#ModelMetricClaimedVerifiedStatus
1BERTMSE0.05Unverified
#ModelMetricClaimedVerifiedStatus
1BERT pretrained on MIMIC-IIIAnswer F163.55Unverified