Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 2231–2240 of 10817 papers

Title	Date	Tasks	Status	Hype	Score
MedAgentBoard: Benchmarking Multi-Agent Collaboration with Conventional Methods for Diverse Medical Tasks	May 18, 2025	BenchmarkingMedical Visual Question Answering	CodeCode Available	1	5
Mintaka: A Complex, Natural, and Multilingual Dataset for End-to-End Question Answering	Oct 4, 2022	Question Answering	CodeCode Available	1	5
Context Awareness Gate For Retrieval Augmented Generation	Nov 25, 2024	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	1	5
DocNLI: A Large-scale Dataset for Document-level Natural Language Inference	Jun 17, 2021	Natural Language InferenceQuestion Answering	CodeCode Available	1	5
MTBench: A Multimodal Time Series Benchmark for Temporal Reasoning and Question Answering	Mar 21, 2025	Question AnsweringTime Series	CodeCode Available	1	5
QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking	Oct 11, 2023	Decision MakingFact Checking	CodeCode Available	1	5
DocVXQA: Context-Aware Visual Explanations for Document Question Answering	May 12, 2025	Question Answering	CodeCode Available	1	5
DocVQA: A Dataset for VQA on Document Images	Jul 1, 2020	Question AnsweringReading Comprehension	CodeCode Available	1	5
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement	Jun 16, 2025	document understandingQuestion Answering	CodeCode Available	1	5
ZusammenQA: Data Augmentation with Specialized Models for Cross-lingual Open-retrieval Question Answering System	May 30, 2022	Answer GenerationData Augmentation	CodeCode Available	1	5

Show:10 25 50

← PrevPage 224 of 1082Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified