SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 72267250 of 10817 papers

TitleStatusHype
Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering0
Connecting Language and Vision to Actions0
Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering0
Overview of Factify5WQA: Fact Verification through 5W Question-Answering0
Hadamard product in deep learning: Introduction, Advances and Challenges0
AMRITA\_CEN@SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering0
Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge0
Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track0
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs0
OVQA: A Clinically Generated Visual Question Answering Dataset0
GW\_QA at SemEval-2017 Task 3: Question Answer Re-ranking on Arabic Fora0
P^3LM: Probabilistically Permuted Prophet Language Modeling for Generative Pre-Training0
PABI: A Unified PAC-Bayesian Informativeness Measure for Incidental Supervision Signals0
PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification0
A Study on Expert Sourcing Enterprise Question Collection and Classification0
Accelerating Real-Time Question Answering via Question Generation0
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models0
P\'agico: Evaluating Wikipedia-based information retrieval in Portuguese0
Paired Examples as Indirect Supervision in Latent Decision Models0
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering0
Confidence Estimation for Knowledge Base Population0
Pairwise Relation Classification with Mirror Instances and a Combined Convolutional Neural Network0
Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement0
GUITAR: Gradient Pruning toward Fast Neural Ranking0
Show:102550
← PrevPage 290 of 433Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified