Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1541–1550 of 10817 papers

Title	Date	Tasks	Status	Hype
Distinguishing Ignorance from Error in LLM Hallucinations	Oct 29, 2024	HallucinationQuestion Answering	CodeCode Available	1
Are VLMs Really Blind	Oct 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review	Oct 29, 2024	Prompt LearningQuestion Answering	CodeCode Available	0
Enhancing Financial Question Answering with a Multi-Agent Reflection Framework	Oct 29, 2024	Question Answering	—Unverified	0
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding	Oct 29, 2024	Action RecognitionAction Segmentation	CodeCode Available	0
Few-Shot Multimodal Explanation for Visual Question Answering	Oct 28, 2024	Explainable artificial intelligenceExplainable Artificial Intelligence (XAI)	CodeCode Available	0
SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval	Oct 28, 2024	Information RetrievalMultilingual Named Entity Recognition	—Unverified	0
Large Language Model Benchmarks in Medical Tasks	Oct 28, 2024	Image CaptioningLanguage Modeling	—Unverified	0
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart	Oct 28, 2024	Question Answering	—Unverified	0
Resilience in Knowledge Graph Embeddings	Oct 28, 2024	Graph EmbeddingInformation Retrieval	—Unverified	0

Show:10 25 50

← PrevPage 155 of 1082Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified