Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9726–9750 of 10817 papers

Title	Date	Tasks	Status
Weaver: Deep Co-Encoding of Questions and Documents for Machine Reading	Apr 27, 2018	Open-Domain Question AnsweringQuestion Answering	—Unverified
Weaver: Interweaving SQL and LLM for Table Reasoning	May 25, 2025	Question AnsweringTable-based Question Answering	—Unverified
WebFAQ: A Multilingual Collection of Natural Q&A Datasets for Dense Retrieval	Feb 28, 2025	Dataset GenerationOpen-Domain Question Answering	—Unverified
WebLists: Extracting Structured Information From Complex Interactive Websites Using Executable LLM Agents	Apr 17, 2025	NavigateQuestion Answering	—Unverified
Web pages segmentation for document selection in Question Answering (Pr\'e-segmentation de pages web et s\'election de documents pertinents en Questions-R\'eponses) [in French]	Jun 1, 2013	Question Answering	—Unverified
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences	Sep 6, 2024	Question Answering	—Unverified
Web Table Extraction, Retrieval and Augmentation: A Survey	Feb 1, 2020	Question AnsweringRetrieval	—Unverified
We Need to Talk About Classification Evaluation Metrics in NLP	Jan 8, 2024	DiversityMachine Translation	—Unverified
Werdy: Recognition and Disambiguation of Verbs and Verb Phrases with Syntactic and Semantic Pruning	Oct 1, 2014	Question AnsweringSemantic Parsing	—Unverified
What are the limits of cross-lingual dense passage retrieval for low-resource languages?	Aug 21, 2024	Answer GenerationLanguage Modeling	—Unverified
What BERTs and GPTs know about your brand? Probing contextual language models for affect associations	Jun 1, 2021	AttributeMarketing	—Unverified
What can AI do for me: Evaluating Machine Learning Interpretations in Cooperative Play	Oct 23, 2018	BIG-bench Machine LearningDecision Making	—Unverified
What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse	Aug 1, 2013	Coreference ResolutionNamed Entity Recognition (NER)	—Unverified
What Does Neuro Mean to Cardio? Investigating the Role of Clinical Specialty Data in Medical LLMs	May 15, 2025	AllBenchmarking	—Unverified
What do we expect from Multiple-choice QA Systems?	Nov 20, 2020	Multiple-choiceMultiple Choice Question Answering (MCQA)	—Unverified
What Gives the Answer Away? Question Answering Bias Analysis on Video QA Datasets	Jul 7, 2020	Multiple-choiceQuestion Answering	—Unverified
What Information is Helpful for Dependency Based Semantic Role Labeling	Oct 1, 2013	ChunkingDependency Parsing	—Unverified
Alexpaca: Learning Factual Clarification Question Generation Without Examples	Oct 17, 2023	BenchmarkingChatbot	—Unverified
What is Event Knowledge Graph: A Survey	Dec 31, 2021	Question AnsweringSurvey	—Unverified
What is needed for simple spatial language capabilities in VQA?	Aug 17, 2019	DiagnosticQuestion Answering	—Unverified
What Large Language Models Bring to Text-rich VQA?	Nov 13, 2023	Image ComprehensionOptical Character Recognition (OCR)	—Unverified
What Makes a Good Dataset for Symbol Description Reading?	Apr 17, 2023	document understandingMath	—Unverified
What Makes Good In-Context Examples for GPT-3?	May 1, 2022	In-Context LearningNatural Language Understanding	—Unverified
What makes us curious? analysis of a corpus of open-domain questions	Oct 28, 2021	Question Answering	—Unverified
What more can Entity Linking do for Question Answering?	Oct 15, 2020	coreference-resolutionCoreference Resolution	—Unverified

Show:10 25 50

← PrevPage 390 of 433Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified