Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 871–880 of 10817 papers

Title	Date	Tasks	Status	Hype
MediQ: Question-Asking LLMs and a Benchmark for Reliable Interactive Clinical Reasoning	Jun 3, 2024	DiagnosticMedQA	CodeCode Available	1
Compositional 4D Dynamic Scenes Understanding with Physics Priors for Video Question Answering	Jun 2, 2024	counterfactualCounterfactual Reasoning	CodeCode Available	1
Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA	May 30, 2024	DiagnosticMedical Diagnosis	CodeCode Available	1
Encoding and Controlling Global Semantics for Long-form Video Question Answering	May 30, 2024	FormQuestion Answering	CodeCode Available	1
One Token Can Help! Learning Scalable and Pluggable Virtual Tokens for Retrieval-Augmented Large Language Models	May 30, 2024	Question AnsweringRAG	CodeCode Available	1
MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions	May 29, 2024	BenchmarkingDialogue Understanding	CodeCode Available	1
Reverse Image Retrieval Cues Parametric Memory in Multimodal LLMs	May 29, 2024	Image RetrievalQuestion Answering	CodeCode Available	1
THREAD: Thinking Deeper with Recursive Spawning	May 27, 2024	Few-Shot LearningQuestion Answering	CodeCode Available	1
Map-based Modular Approach for Zero-shot Embodied Question Answering	May 26, 2024	Embodied Question AnsweringNavigate	CodeCode Available	1
Semantic Density: Uncertainty Quantification for Large Language Models through Confidence Measurement in Semantic Space	May 22, 2024	MisinformationQuestion Answering	CodeCode Available	1

Show:10 25 50

← PrevPage 88 of 1082Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified