Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1526–1550 of 10817 papers

Title	Date	Tasks	Status	Hype
Nearest Neighbor Normalization Improves Multimodal Retrieval	Oct 31, 2024	Cross-Modal RetrievalImage Captioning	CodeCode Available	1
LEAF: Learning and Evaluation Augmented by Fact-Checking to Improve Factualness in Large Language Models	Oct 31, 2024	Fact CheckingMedical Question Answering	—Unverified	0
MDCure: A Scalable Pipeline for Multi-Document Instruction-Following	Oct 30, 2024	ArticlesInstruction Following	CodeCode Available	0
Symbolic Graph Inference for Compound Scene Understanding	Oct 30, 2024	Question AnsweringScene Understanding	—Unverified	0
SimpsonsVQA: Enhancing Inquiry-Based Learning with a Tailored Dataset	Oct 30, 2024	Question AnsweringVisual Question Answering	—Unverified	0
Danoliteracy of Generative, Large Language Models	Oct 30, 2024	Question Answering	—Unverified	0
Dynamic Strategy Planning for Efficient Question Answering with Large Language Models	Oct 30, 2024	Multi-hop Question AnsweringQuestion Answering	—Unverified	0
Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings	Oct 30, 2024	Question AnsweringUncertainty Quantification	—Unverified	0
BUZZ: Beehive-structured Sparse KV Cache with Segmented Heavy Hitters for Efficient LLM Inference	Oct 30, 2024	Computational EfficiencyQuestion Answering	CodeCode Available	0
Multi-Agent Large Language Models for Conversational Task-Solving	Oct 30, 2024	FairnessQuestion Answering	CodeCode Available	2
NeuroSym-BioCAT: Leveraging Neuro-Symbolic Methods for Biomedical Scholarly Document Categorization and Question Answering	Oct 29, 2024	Question Answering	—Unverified	0
GRADE: Quantifying Sample Diversity in Text-to-Image Models	Oct 29, 2024	AttributeDiversity	—Unverified	0
RealCQA-V2 : Visual Premise Proving A Manual COT Dataset for Charts	Oct 29, 2024	Chart Question AnsweringQuestion Answering	—Unverified	0
AAAR-1.0: Assessing AI's Potential to Assist Research	Oct 29, 2024	Question Answering	—Unverified	0
Synthetic Data Generation with Large Language Models for Personalized Community Question Answering	Oct 29, 2024	Community Question AnsweringInformation Retrieval	CodeCode Available	0
Distinguishing Ignorance from Error in LLM Hallucinations	Oct 29, 2024	HallucinationQuestion Answering	CodeCode Available	1
Are VLMs Really Blind	Oct 29, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Knowledge-Guided Prompt Learning for Request Quality Assurance in Public Code Review	Oct 29, 2024	Prompt LearningQuestion Answering	CodeCode Available	0
Enhancing Financial Question Answering with a Multi-Agent Reflection Framework	Oct 29, 2024	Question Answering	—Unverified	0
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding	Oct 29, 2024	Action RecognitionAction Segmentation	CodeCode Available	0
Few-Shot Multimodal Explanation for Visual Question Answering	Oct 28, 2024	Explainable artificial intelligenceExplainable Artificial Intelligence (XAI)	CodeCode Available	0
SandboxAQ's submission to MRL 2024 Shared Task on Multi-lingual Multi-task Information Retrieval	Oct 28, 2024	Information RetrievalMultilingual Named Entity Recognition	—Unverified	0
Large Language Model Benchmarks in Medical Tasks	Oct 28, 2024	Image CaptioningLanguage Modeling	—Unverified	0
CT2C-QA: Multimodal Question Answering over Chinese Text, Table and Chart	Oct 28, 2024	Question Answering	—Unverified	0
Resilience in Knowledge Graph Embeddings	Oct 28, 2024	Graph EmbeddingInformation Retrieval	—Unverified	0

Show:10 25 50

← PrevPage 62 of 433Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified