SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 13761400 of 10817 papers

TitleStatusHype
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-GenerationCode1
Attributed Question Answering: Evaluation and Modeling for Attributed Large Language ModelsCode1
APOLLO: An Optimized Training Approach for Long-form Numerical ReasoningCode1
VindLU: A Recipe for Effective Video-and-Language PretrainingCode1
Hierarchical multimodal transformers for Multi-Page DocVQACode1
Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single TransformerCode1
UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge GraphCode1
Nonparametric Masked Language ModelingCode1
Relation-Aware Language-Graph Transformer for Question AnsweringCode1
A Sequential Flow Control Framework for Multi-hop Knowledge Base Question AnsweringCode1
Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual ReasoningCode1
AIONER: All-in-one scheme-based biomedical named entity recognition using deep learningCode1
CREPE: Open-Domain Question Answering with False PresuppositionsCode1
Frustratingly Easy Label Projection for Cross-lingual TransferCode1
Self-supervised vision-language pretraining for Medical visual question answeringCode1
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion LearningCode1
Hengam: An Adversarially Trained Transformer for Persian Temporal TaggingCode1
Visual Commonsense-aware Representation Network for Video CaptioningCode1
I Can't Believe There's No Images! Learning Visual Tasks Using only Language SupervisionCode1
MapQA: A Dataset for Question Answering on Choropleth MapsCode1
QAmeleon: Multilingual QA with Only 5 ExamplesCode1
PromptCap: Prompt-Guided Task-Aware Image CaptioningCode1
Large Language Models Struggle to Learn Long-Tail KnowledgeCode1
Retrieval-Augmented Generative Question Answering for Event Argument ExtractionCode1
Mining Mathematical Documents for Question Answering via Unsupervised Formula LabelingCode1
Show:102550
← PrevPage 56 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified