Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1281–1290 of 10817 papers

Title	Date	Tasks	Status	Hype
RETQA: A Large-Scale Open-Domain Tabular Question Answering Dataset for Real Estate Sector	Dec 13, 2024	In-Context LearningQuestion Answering	CodeCode Available	1
VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation	Dec 13, 2024	Instruction FollowingQuestion Answering	—Unverified	0
IQViC: In-context, Question Adaptive Vision Compressor for Long-term Video Understanding LMMs	Dec 13, 2024	Question AnsweringVideo Question Answering	—Unverified	0
OG-RAG: Ontology-Grounded Retrieval-Augmented Generation For Large Language Models	Dec 12, 2024	Question AnsweringRAG	—Unverified	0
ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation	Dec 12, 2024	Phrase GroundingQuestion Answering	—Unverified	0
Multi-Scale Heterogeneous Text-Attributed Graph Datasets From Diverse Domains	Dec 12, 2024	Community Question AnsweringGraph Learning	CodeCode Available	0
Unifying AI Tutor Evaluation: An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors	Dec 12, 2024	Question Answering	CodeCode Available	1
ViUniT: Visual Unit Tests for More Robust Visual Programming	Dec 12, 2024	Image GenerationImage-text matching	—Unverified	0
Assessing the Robustness of Retrieval-Augmented Generation Systems in K-12 Educational Question Answering with Knowledge Discrepancies	Dec 12, 2024	Question AnsweringRAG	—Unverified	0
Doe-1: Closed-Loop Autonomous Driving with Large World Model	Dec 12, 2024	Autonomous DrivingDecision Making	CodeCode Available	2

Show:10 25 50

← PrevPage 129 of 1082Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified