SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 60016025 of 10817 papers

TitleStatusHype
Two-Turn Debate Doesn't Help Humans Answer Hard Reading Comprehension Questions0
QA Domain Adaptation using Hidden Space Augmentation and Self-Supervised Contrastive AdaptationCode0
Image Semantic Relation Generation0
Aligning MAGMA by Few-Shot Learning and Finetuning0
Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering0
CAVE: Correcting Attribute Values in E-commerce ProfilesCode0
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero TrainingCode0
ReasonChainQA: Text-based Complex Question Answering with Explainable Evidence Chains0
Adversarial and Safely Scaled Question Generation0
Answer ranking in Community Question Answering: a deep learning approach0
Can Language Representation Models Think in Bets?0
"John is 50 years old, can his son be 65?" Evaluating NLP Models' Understanding of FeasibilityCode0
TweetNERD -- End to End Entity Linking Benchmark for TweetsCode0
ConEntail: An Entailment-based Framework for Universal Zero and Few Shot Classification with Supervised Contrastive PretrainingCode0
Closed-book Question Generation via Contrastive LearningCode0
Shortcomings of Question Answering Based Factuality Frameworks for Error LocalizationCode0
SODAPOP: Open-Ended Discovery of Social Biases in Social Commonsense Reasoning ModelsCode0
Benchmarking Long-tail Generalization with Likelihood SplitsCode0
Overview of BioASQ 2022: The tenth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering0
Challenges in Explanation Quality Evaluation0
Towards End-to-End Open Conversational Machine ReadingCode0
CIKQA: Learning Commonsense Inference with a Unified Knowledge-in-the-loop QA Paradigm0
Are Sample-Efficient NLP Models More Robust?0
Question Answering Over Biological Knowledge Graph via Amazon Alexa0
Relational Graph Convolutional Neural Networks for Multihop Reasoning: A Comparative Study0
Show:102550
← PrevPage 241 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified