SOTAVerified

Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Showing 72017250 of 10817 papers

TitleStatusHype
Hallucination Augmented Recitations for Language Models0
Orca 2: Teaching Small Language Models How to Reason0
Order Matters: Exploring Order Sensitivity in Multimodal Large Language Models0
LLM-aided explanations of EDA synthesis errors0
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning0
ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation0
CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning0
Orthogonality of Syntax and Semantics within Distributional Spaces0
Is There No Such Thing as a Bad Question? H4R: HalluciBot For Ratiocination, Rewriting, Ranking, and Routing0
Orthogonality regularizer for question answering0
OSU\_CHGCG at SemEval-2016 Task 9 : Chinese Semantic Dependency Parsing with Generalized Categorial Grammar0
AMR Parsing with an Incremental Joint Model0
Explanation as Question Answering based on a Task Model of the Agent's Design0
Out of the Box: Reasoning with Graph Convolution Nets for Factual Visual Question Answering0
HAIR: Hierarchical Visual-Semantic Relational Reasoning for Video Question Answering0
A Study on Multimodal and Interactive Explanations for Visual Question Answering0
Overcoming Language Bias in Remote Sensing Visual Question Answering via Adversarial Training0
Overcoming Language Priors for Visual Question Answering Based on Knowledge Distillation0
Overcoming Language Priors in Visual Question Answering with Adversarial Regularization0
Architecture for a Trustworthy Quantum Chatbot0
PeCoQ: A Dataset for Persian Complex Question Answering over Knowledge Graph0
Overcoming the vanishing gradient problem in plain recurrent networks0
Overfitting at SemEval-2016 Task 3: Detecting Semantically Similar Questions in Community Question Answering Forums with Word Embeddings0
Overinformative Question Answering by Humans and Machines0
Overview of BioASQ 2020: The eighth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering0
Overview of BioASQ 2021: The ninth BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering0
Connecting Language and Vision to Actions0
Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering0
Overview of Factify5WQA: Fact Verification through 5W Question-Answering0
Hadamard product in deep learning: Introduction, Advances and Challenges0
AMRITA\_CEN@SemEval-2015: Paraphrase Detection for Twitter using Unsupervised Feature Learning with Recursive Autoencoders0
Overview of the MedVidQA 2022 Shared Task on Medical Video Question-Answering0
Overview of the NLPCC 2025 Shared Task 4: Multi-modal, Multilingual, and Multi-hop Medical Instructional Video Question Answering Challenge0
Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track0
PEACE: Empowering Geologic Map Holistic Understanding with MLLMs0
OVQA: A Clinically Generated Visual Question Answering Dataset0
GW\_QA at SemEval-2017 Task 3: Question Answer Re-ranking on Arabic Fora0
P^3LM: Probabilistically Permuted Prophet Language Modeling for Generative Pre-Training0
PABI: A Unified PAC-Bayesian Informativeness Measure for Incidental Supervision Signals0
PaCCSS-IT: A Parallel Corpus of Complex-Simple Sentences for Automatic Text Simplification0
A Study on Expert Sourcing Enterprise Question Collection and Classification0
Accelerating Real-Time Question Answering via Question Generation0
PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models0
P\'agico: Evaluating Wikipedia-based information retrieval in Portuguese0
Paired Examples as Indirect Supervision in Latent Decision Models0
Exploiting Bilingual Translation for Question Retrieval in Community-Based Question Answering0
Confidence Estimation for Knowledge Base Population0
Pairwise Relation Classification with Mirror Instances and a Combined Convolutional Neural Network0
Pairwise Word Interaction Modeling with Deep Neural Networks for Semantic Similarity Measurement0
GUITAR: Gradient Pruning toward Fast Neural Ranking0
Show:102550
← PrevPage 145 of 217Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1IE-Net (ensemble)EM90.94Unverified
2FPNet (ensemble)EM90.87Unverified
3IE-NetV2 (ensemble)EM90.86Unverified
4SA-Net on Albert (ensemble)EM90.72Unverified
5SA-Net-V2 (ensemble)EM90.68Unverified
6FPNet (ensemble)EM90.6Unverified
7Retro-Reader (ensemble)EM90.58Unverified
8EntitySpanFocusV2 (ensemble)EM90.52Unverified
9TransNets + SFVerifier + SFEnsembler (ensemble)EM90.49Unverified
10EntitySpanFocus+AT (ensemble)EM90.45Unverified