SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 18511875 of 10817 papers

TitleStatusHype
QA-GNN: Reasoning with Language Models and Knowledge Graphs for Question AnsweringCode1
SpartQA: : A Textual Question Answering Benchmark for Spatial ReasoningCode1
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt CollectionsCode1
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question AnsweringCode1
CBench: Towards Better Evaluation of Question Answering Over Knowledge GraphsCode1
Conversational Question Answering over Knowledge Graphs with Transformer and Graph Attention NetworksCode1
MMBERT: Multimodal BERT Pretraining for Improved Medical VQACode1
VisQA: X-raying Vision and Language Reasoning in TransformersCode1
NLQuAD: A Non-Factoid Long Question Answering Data SetCode1
FeTaQA: Free-form Table Question AnsweringCode1
Are Bias Mitigation Techniques for Deep Learning Effective?Code1
MultiReQA: A Cross-Domain Evaluation forRetrieval Question Answering ModelsCode1
Automatically Generating Cause-and-Effect Questions from PassagesCode1
Towards General Purpose Vision SystemsCode1
SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning over Traffic EventsCode1
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder TransformersCode1
A Comprehensive Review of the Video-to-Text ProblemCode1
On the hidden treasure of dialog in video question answeringCode1
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask BenchmarkCode1
QuestEval: Summarization Asks for Fact-based EvaluationCode1
Multi-Modal Answer Validation for Knowledge-Based VQACode1
Controllable Generation from Pre-trained Language Models via Inverse PromptingCode1
Cooperative Self-training of Machine Reading ComprehensionCode1
Knowledge Graph Question Answering using Graph-Pattern IsomorphismCode1
Hurdles to Progress in Long-form Question AnsweringCode1
Show:102550
← PrevPage 75 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified