Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 826–850 of 10817 papers

Title	Date	Tasks	Status	Hype
Evolving Prompts In-Context: An Open-ended, Self-replicating Perspective	Jun 22, 2025	In-Context LearningLarge Language Model	CodeCode Available	1
EXIT: Context-Aware Extractive Compression for Enhancing Retrieval-Augmented Generation	Dec 17, 2024	Question AnsweringRAG	CodeCode Available	1
AntifakePrompt: Prompt-Tuned Vision-Language Models are Fake Image Detectors	Oct 26, 2023	DeepFake DetectionFace Swapping	CodeCode Available	1
Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering	Jul 22, 2023	Graph Representation LearningLanguage Modeling	CodeCode Available	1
Explainable Conversational Question Answering over Heterogeneous Sources via Iterative Graph Neural Networks	May 2, 2023	Conversational Question AnsweringQuestion Answering	CodeCode Available	1
DocNLI: A Large-scale Dataset for Document-level Natural Language Inference	Jun 17, 2021	Natural Language InferenceQuestion Answering	CodeCode Available	1
Diversify Question Generation with Retrieval-Augmented Style Transfer	Oct 23, 2023	DiversityQuestion Answering	CodeCode Available	1
Exploring and Predicting Transferability across NLP Tasks	May 2, 2020	Language ModelingLanguage Modelling	CodeCode Available	1
Exploring Dual Encoder Architectures for Question Answering	Apr 14, 2022	Information RetrievalQuestion Answering	CodeCode Available	1
Exploring Perceptual Limitation of Multimodal Large Language Models	Feb 12, 2024	ObjectQuestion Answering	CodeCode Available	1
Ditch the Gold Standard: Re-evaluating Conversational Question Answering	Dec 16, 2021	Conversational Question AnsweringQuestion Answering	CodeCode Available	1
Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents	Mar 6, 2022	Community Question AnsweringInformation Retrieval	CodeCode Available	1
Exploring the Benefits of Training Expert Language Models over Instruction Tuning	Feb 7, 2023	Common Sense ReasoningCoreference Resolution	CodeCode Available	1
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge	Jun 3, 2022	Question AnsweringVisual Question Answering	CodeCode Available	1
Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data	Oct 7, 2020	AttributeQuestion Answering	CodeCode Available	1
Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments	Dec 19, 2022	In-Context LearningKnowledge Base Question Answering	CodeCode Available	1
Extracting Definienda in Mathematical Scholarly Articles with Transformers	Nov 21, 2023	ArticlesLanguage Modeling	CodeCode Available	1
FaceBench: A Multi-View Multi-Level Facial Attribute VQA Dataset for Benchmarking Face Perception MLLMs	Mar 27, 2025	AttributeBenchmarking	CodeCode Available	1
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators	Jun 19, 2024	Fact VerificationQuestion Answering	CodeCode Available	1
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	Oct 2, 2019	Hate Speech DetectionKnowledge Distillation	CodeCode Available	1
Distantly-Supervised Evidence Retrieval Enables Question Answering without Evidence Annotation	Oct 10, 2021	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	1
A Hitchhikers Guide to Fine-Grained Face Forgery Detection Using Common Sense Reasoning	Oct 1, 2024	Common Sense ReasoningDeepFake Detection	CodeCode Available	1
Fauno: The Italian Large Language Model that will leave you senza parole!	Jun 26, 2023	GPULanguage Modeling	CodeCode Available	1
ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences	Jun 22, 2021	Argument MiningDecoder	CodeCode Available	1
Distilled Dual-Encoder Model for Vision-Language Understanding	Dec 16, 2021	Image to textmodel	CodeCode Available	1

Show:10 25 50

← PrevPage 34 of 433Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified