Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9751–9775 of 10817 papers

Title	Date	Tasks	Status
What or Who is Multilingual Watson?	Aug 1, 2014	Information RetrievalNamed Entity Recognition (NER)	—Unverified
What Question Answering can Learn from Trivia Nerds	Oct 31, 2019	Question Answering	—Unverified
What Should I Do Now? Marrying Reinforcement Learning and Symbolic Planning	Jan 6, 2019	Deep Reinforcement LearningQuestion Answering	—Unverified
What's in an Explanation? Characterizing Knowledge and Inference Requirements for Elementary Science Exams	Dec 1, 2016	Knowledge Base ConstructionQuestion Answering	—Unverified
What's in your Head? Emergent Behaviour in Multi-Task Transformer Models	Apr 13, 2021	Language ModelingLanguage Modelling	—Unverified
What’s in Your Head? Emergent Behaviour in Multi-Task Transformer Models	Nov 1, 2021	Language ModellingQuestion Answering	—Unverified
What Would a Teacher Do? Predicting Future Talk Moves	Jun 9, 2021	Question Answering	—Unverified
What Would it Take to get Biomedical QA Systems into Practice?	Sep 21, 2021	Medical Question AnsweringQuestion Answering	—Unverified
When ACE met KBP: End-to-End Evaluation of Knowledge Base Population with Component-level Annotation	May 1, 2018	Knowledge Base PopulationNamed Entity Recognition (NER)	—Unverified
When are Lemons Purple? The Concept Association Bias of Vision-Language Models	Dec 22, 2022	Attributeimage-classification	—Unverified
When Crowd Meets Persona: Creating a Large-Scale Open-Domain Persona Dialogue Corpus	Apr 1, 2023	Dialogue GenerationQuestion Answering	—Unverified
When Giant Language Brains Just Aren't Enough! Domain Pizzazz with Knowledge Sparkle Dust	May 12, 2023	Domain AdaptationQuestion Answering	—Unverified
When is dataset cartography ineffective? Using training dynamics does not improve robustness against Adversarial SQuAD	Mar 24, 2025	Adversarial RobustnessExtractive Question-Answering	—Unverified
When to Read Documents or QA History: On Unified and Selective Open-domain QA	Jun 7, 2023	Natural QuestionsOpen-Domain Question Answering	—Unverified
When to Speak, When to Abstain: Contrastive Decoding with Abstention	Dec 17, 2024	HallucinationQuestion Answering	—Unverified
When Two LLMs Debate, Both Think They'll Win	May 25, 2025	Question Answering	—Unverified
Where is Linked Data in Question Answering over Linked Data?	May 7, 2020	Question Answering	—Unverified
Where is this coming from? Making groundedness count in the evaluation of Document VQA models	Mar 24, 2025	Question AnsweringVisual Question Answering	—Unverified
Where To Look: Focus Regions for Visual Question Answering	Nov 23, 2015	Question AnsweringVisual Question Answering	—Unverified
Where Was Alexander the Great in 325 BC? Toward Understanding History Text with a World Model	Sep 1, 2015	Coreference ResolutionNatural Language Inference	—Unverified
Where Was COVID-19 First Discovered? Designing a Question-Answering System for Pandemic Situations	Apr 19, 2022	Information RetrievalMisinformation	—Unverified
Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering	Oct 23, 2024	Federated LearningMedical Visual Question Answering	—Unverified
Which Linguist Invented the Lightbulb? Presupposition Verification for Question-Answering	Jan 2, 2021	Explanation GenerationNatural Questions	—Unverified
Which of These Best Describes Multiple Choice Evaluation with LLMs? A) Forced B) Flawed C) Fixable D) All of the Above	Feb 19, 2025	AllMultiple-choice	—Unverified
Which Step Do I Take First? Troubleshooting with Bayesian Models	Jan 1, 2015	Community Question AnsweringInformation Retrieval	—Unverified

Show:10 25 50

← PrevPage 391 of 433Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified