Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9751–9800 of 10817 papers

Title	Date	Tasks	Status
What Question Answering can Learn from Trivia Nerds	Oct 31, 2019	Question Answering	—Unverified
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning	Jan 6, 2019	Deep Reinforcement LearningQuestion Answering	—Unverified
What's in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams	Dec 1, 2016	Knowledge Base ConstructionQuestion Answering	—Unverified
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models	Apr 13, 2021	Language ModelingLanguage Modelling	—Unverified
What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models	Nov 1, 2021	Language ModellingQuestion Answering	—Unverified
What Would a Teacher Do? Predicting Future Talk Moves	Jun 9, 2021	Question Answering	—Unverified
What Would it Take to get Biomedical QA Systems into Practice?	Sep 21, 2021	Medical Question AnsweringQuestion Answering	—Unverified
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation	May 1, 2018	Knowledge Base PopulationNamed Entity Recognition (NER)	—Unverified
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified
When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus	Apr 1, 2023	Dialogue GenerationQuestion Answering	—Unverified
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust	May 12, 2023	Domain AdaptationQuestion Answering	—Unverified
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD	Mar 24, 2025	Adversarial RobustnessExtractive Question-Answering	—Unverified
When to Read Documents or QA History: On Unified and Selective Open-domain QA	Jun 7, 2023	Natural QuestionsOpen-Domain Question Answering	—Unverified
When to Speak, When to Abstain: Contrastive Decoding with Abstention	Dec 17, 2024	HallucinationQuestion Answering	—Unverified
When Two LLMs Debate, Both Think They'll Win	May 25, 2025	Question Answering	—Unverified
Where is Linked Data in Question Answering over Linked Data?	May 7, 2020	Question Answering	—Unverified
Where is this coming from? Making groundedness count in the evaluation of Document VQA models	Mar 24, 2025	Question AnsweringVisual Question Answering	—Unverified
Where To Look: Focus Regions for Visual Question Answering	Nov 23, 2015	Question AnsweringVisual Question Answering	—Unverified
Where Was Alexander the Great in 325 BC? Toward Understanding History Text with a World Model	Sep 1, 2015	Coreference ResolutionNatural Language Inference	—Unverified
Where Was COVID-19 First Discovered? Designing a Question-Answering System for Pandemic Situations	Apr 19, 2022	Information RetrievalMisinformation	—Unverified
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering	Oct 23, 2024	Federated LearningMedical Visual Question Answering	—Unverified
Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering	Jan 2, 2021	Explanation GenerationNatural Questions	—Unverified
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above	Feb 19, 2025	AllMultiple-choice	—Unverified
Which Step Do I Take First? Troubleshooting with Bayesian Models	Jan 1, 2015	Community Question AnsweringInformation Retrieval	—Unverified
``Who was Pietro Badoglio?'' Towards a QA system for Italian History	May 1, 2016	Question Answering	—Unverified
Why Artificial Intelligence Needs a Task Theory --- And What It Might Look Like	Apr 15, 2016	Board GamesQuestion Answering	—Unverified
Why ``Blow Out''? A Structural Analysis of the Movie Dialog Dataset	Aug 1, 2016	Information RetrievalQuestion Answering	—Unverified
Why can't memory networks read effectively?	Oct 16, 2019	Machine Reading ComprehensionQuestion Answering	—Unverified
Why context matters in VQA and Reasoning: Semantic interventions for VLM input modalities	Oct 2, 2024	Question AnsweringVisual Question Answering	—Unverified
Why Does a Visual Question Have Different Answers?	Aug 12, 2019	Question AnsweringVisual Question Answering	—Unverified
Why Does ChatGPT Fall Short in Providing Truthful Answers?	Apr 20, 2023	MemorizationOpen-Domain Question Answering	—Unverified
Why Does the VQA Model Answer No?: Improving Reasoning through Visual and Linguistic Inference	Sep 25, 2019	Common Sense ReasoningQuestion Answering	—Unverified
Why Do Masked Neural Language Models Still Need Common Sense Knowledge?	Nov 8, 2019	Common Sense ReasoningQuestion Answering	—Unverified
Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?	Jun 6, 2024	Multiple-choiceQuestion Answering	—Unverified
Why-Question Answering using Intra- and Inter-Sentential Causal Relations	Aug 1, 2013	Question Answering	—Unverified
Why Question Answering using Sentiment Analysis and Word Classes	Jul 1, 2012	Question AnsweringSentiment Analysis	—Unverified
Why Settle for Just One? Extending EL++ Ontology Embeddings with Many-to-Many Relationships	Oct 20, 2021	Link PredictionQuestion Answering	—Unverified
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs	Dec 19, 2024	Arithmetic ReasoningCode Generation	—Unverified
Wikidata as a seed for Web Extraction	Jan 15, 2024	Question Answering	—Unverified
Wiki-LLaVA: Hierarchical Retrieval-Augmented Generation for Multimodal LLMs	Apr 23, 2024	Question AnsweringRetrieval	—Unverified
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts	Jun 18, 2025	document understandingMultiple-choice	—Unverified
WikiOmnia: generative QA corpus on the whole Russian Wikipedia	Apr 17, 2022	Question Answering	—Unverified
WikiPassageQA: A Benchmark Collection for Research on Non-factoid Answer Passage Retrieval	May 10, 2018	Information RetrievalOpen-Domain Question Answering	—Unverified
WikiQA: A Challenge Dataset for Open-Domain Question Answering	Sep 1, 2015	Answer SelectionOpen-Domain Question Answering	—Unverified
WikiTalk: A Spoken Wikipedia-based Open-Domain Knowledge Access System	Dec 1, 2012	Question Answering	—Unverified
WikiWhy: Answering and Explaining Cause-and-Effect Questions	Oct 21, 2022	Question Answering	—Unverified
WildQA: In-the-Wild Video Question Answering	Sep 14, 2022	Evidence SelectionQuestion Answering	—Unverified
Will the Prince Get True Love's Kiss? On the Model Sensitivity to Gender Perturbation over Fairytale Texts	Oct 16, 2023	counterfactualData Augmentation	—Unverified
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering	Sep 14, 2021	Question Answering	—Unverified
Winning Big with Small Models: Knowledge Distillation vs. Self-Training for Reducing Hallucination in QA Agents	Feb 26, 2025	HallucinationKnowledge Distillation	—Unverified

Show:10 25 50

← PrevPage 196 of 217Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified