Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 9951–10000 of 10817 papers

Title	Date	Tasks	Status
AMQA: An Adversarial Dataset for Benchmarking Bias of LLMs in Medicine and Healthcare	May 26, 2025	BenchmarkingMedical Diagnosis	CodeCode Available
Evaluating Semantic Parsing against a Simple Web-based Question Answering Model	Jul 14, 2017	Question AnsweringSemantic Parsing	CodeCode Available
CMQA: A Dataset of Conditional Question Answering with Multiple-Span Answers	Oct 1, 2022	Question Answering	CodeCode Available
C-MORE: Pretraining to Answer Open-Domain Questions by Consulting Millions of References	Mar 16, 2022	Open-Domain Question AnsweringQuestion Answering	CodeCode Available
CMDBench: A Benchmark for Coarse-to-fine Multimodal Data Discovery in Compound AI Systems	Jun 2, 2024	Question Answering	CodeCode Available
Evaluating Prompt-based Question Answering for Object Prediction in the Open Research Knowledge Graph	May 22, 2023	General KnowledgeQuestion Answering	CodeCode Available
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models	Jul 9, 2024	coreference-resolutionCoreference Resolution	CodeCode Available
Relation-Aware Question Answering for Heterogeneous Knowledge Graphs	Dec 19, 2023	Knowledge Base Question AnsweringKnowledge Graphs	CodeCode Available
Adding Gradient Noise Improves Learning for Very Deep Networks	Nov 21, 2015	Question Answering	CodeCode Available
Listen Then See: Video Alignment with Speaker Attention	Apr 21, 2024	cross-modal alignmentQuestion Answering	CodeCode Available
PRIV-QA: Privacy-Preserving Question Answering for Cloud Large Language Models	Feb 19, 2025	Open-Ended Question AnsweringPrivacy Preserving	CodeCode Available
CluMo: Cluster-based Modality Fusion Prompt for Continual Learning in Visual Question Answering	Aug 21, 2024	Continual LearningQuestion Answering	CodeCode Available
Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering	May 5, 2020	Extractive Question-AnsweringQuestion Answering	CodeCode Available
Evaluating Natural Language Understanding Services for Conversational Question Answering Systems	Aug 1, 2017	ChatbotConversational Question Answering	CodeCode Available
NIPS Conversational Intelligence Challenge 2017 Winner System: Skill-based Conversational Agent with Supervised Dialog Manager	Aug 1, 2018	Goal-Oriented DialogGoal-Oriented Dialogue Systems	CodeCode Available
NIR-Prompt: A Multi-task Generalized Neural Information Retrieval Training Framework	Dec 1, 2022	Information RetrievalOpen-Domain Question Answering	CodeCode Available
Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression	May 1, 2024	Language ModelingLanguage Modelling	CodeCode Available
NitiBench: A Comprehensive Studies of LLM Frameworks Capabilities for Thai Legal Question Answering	Feb 15, 2025	ChunkingInformation Retrieval	CodeCode Available
LittleMu: Deploying an Online Virtual Teaching Assistant via Heterogeneous Sources Integration and Chain of Teach Prompts	Aug 11, 2023	Language ModellingQuestion Answering	CodeCode Available
LiveQA: A Question Answering Dataset over Sports Live	Oct 1, 2020	Multiple-choiceQuestion Answering	CodeCode Available
R^3: Reinforced Reader-Ranker for Open-Domain Question Answering	Aug 31, 2017	Answer GenerationInformation Retrieval	CodeCode Available
Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering	May 22, 2025	Global FactsLanguage Modeling	CodeCode Available
NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian	Dec 3, 2023	Natural Language UnderstandingQuestion Answering	CodeCode Available
Fine-Grained Stateful Knowledge Exploration: A Novel Paradigm for Integrating Knowledge Graphs with Large Language Models	Jan 24, 2024	Knowledge Base Question AnsweringKnowledge Graphs	CodeCode Available
Probabilistic Embeddings for Frozen Vision-Language Models: Uncertainty Quantification with Gaussian Process Latent Variable Models	May 8, 2025	Active Learningcross-modal alignment	CodeCode Available
Evaluating Large Language Models in Semantic Parsing for Conversational Question Answering over Knowledge Graphs	Jan 3, 2024	Conversational Question AnsweringInformation Retrieval	CodeCode Available
Evaluating Fairness in Large Vision-Language Models Across Diverse Demographic Attributes and Prompts	Jun 25, 2024	FairnessQuestion Answering	CodeCode Available
Evaluating Explanations: How much do explanations from the teacher aid students?	Dec 1, 2020	Question Answeringtext-classification	CodeCode Available
Evaluating Dependencies in Fact Editing for Language Models: Specificity and Implication Awareness	Dec 4, 2023	knowledge editingLanguage Modeling	CodeCode Available
Evaluating Coreference Resolvers on Community-based Question Answering: From Rule-based to State of the Art	Oct 1, 2022	Answer Selectioncoreference-resolution	CodeCode Available
Evaluating Commonsense in Pre-trained Language Models	Nov 27, 2019	Language ModelingLanguage Modelling	CodeCode Available
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory	Dec 10, 2022	Image CaptioningLanguage Modeling	CodeCode Available
Llama SLayer 8B: Shallow Layers Hold the Key to Knowledge Injection	Oct 3, 2024	Mathparameter-efficient fine-tuning	CodeCode Available
Evaluating Attribute Comprehension in Large Vision-Language Models	Aug 25, 2024	AttributeImage-text matching	CodeCode Available
Relation Extraction : A Survey	Dec 14, 2017	ArticlesInformation Retrieval	CodeCode Available
NLPContributions: An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature	Jun 23, 2020	ArticlesMachine Translation	CodeCode Available
EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque	Apr 18, 2024	Question Answering	CodeCode Available
ETPC - A Paraphrase Identification Corpus Annotated with Extended Paraphrase Typology and Negation	May 1, 2018	Natural Language InferenceNegation	CodeCode Available
Ethical-Advice Taker: Do Language Models Understand Natural Language Interventions?	Jun 2, 2021	EthicsFew-Shot Learning	CodeCode Available
ERVQA: A Dataset to Benchmark the Readiness of Large Vision Language Models in Hospital Environments	Oct 8, 2024	DecoderQuestion Answering	CodeCode Available
Closed-book Question Generation via Contrastive Learning	Oct 13, 2022	Contrastive LearningNatural Questions	CodeCode Available
ERNIE-Layout: Layout-Knowledge Enhanced Multi-modal Pre-training for Document Understanding	Jan 16, 2022	cross-modal alignmentDocument Classification	CodeCode Available
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer	Dec 31, 2020	Language ModelingLanguage Modelling	CodeCode Available
EQuANt (Enhanced Question Answer Network)	Jun 24, 2019	Machine Reading ComprehensionMulti-Task Learning	CodeCode Available
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care	May 1, 2025	Language ModelingLanguage Modelling	CodeCode Available
Reasoning over Uncertain Text by Generative Large Language Models	Feb 14, 2024	Decision MakingMathematical Reasoning	CodeCode Available
EQA-RM: A Generative Embodied Reward Model with Test-time Scaling	Jun 12, 2025	Embodied Question AnsweringQuestion Answering	CodeCode Available
NLProlog: Reasoning with Weak Unification for Question Answering in Natural Language	Jun 14, 2019	Question AnsweringSentence	CodeCode Available
LLM-as-a-Judge: Reassessing the Performance of LLMs in Extractive QA	Apr 16, 2025	Question AnsweringReading Comprehension	CodeCode Available
ClinKD: Cross-Modal Clinical Knowledge Distiller For Multi-Task Medical Images	Feb 9, 2025	Clinical KnowledgeMedical Visual Question Answering	CodeCode Available

Show:10 25 50

← PrevPage 200 of 217Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified