Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 201–225 of 10817 papers

Title	Date	Tasks	Status	Hype	Score
Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams	Jun 12, 2024	cross-modal alignmentLanguage Modelling	CodeCode Available	3	5
TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones	Dec 28, 2023	Computational EfficiencyImage Captioning	CodeCode Available	3	5
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning	Jan 25, 2025	Answer GenerationMulti-agent Reinforcement Learning	CodeCode Available	2	5
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models	Jan 27, 2024	Medical Question AnsweringMultiple-choice	CodeCode Available	2	5
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages	Apr 25, 2024	Cross-Lingual Question AnsweringDiversity	CodeCode Available	2	5
Hyena Hierarchy: Towards Larger Convolutional Language Models	Feb 21, 2023	2k8k	CodeCode Available	2	5
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding	Jul 2, 2024	document understandingKey Information Extraction	CodeCode Available	2	5
Beyond Accuracy: Behavioral Testing of NLP models with CheckList	May 8, 2020	Question AnsweringSentiment Analysis	CodeCode Available	2	5
Huatuo-26M, a Large-scale Chinese Medical QA Dataset	May 2, 2023	Language ModelingLanguage Modelling	CodeCode Available	2	5
How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library	Mar 31, 2024	Question Answering	CodeCode Available	2	5
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	Dec 28, 2022	8kCoreference Resolution	CodeCode Available	2	5
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion	May 4, 2022	Information RetrievalKnowledge Graph Completion	CodeCode Available	2	5
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions	Jan 24, 2024	document understandingQuestion Answering	CodeCode Available	2	5
HMT: Hierarchical Memory Transformer for Long Context Language Processing	May 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Habitat: A Platform for Embodied AI Research	Apr 2, 2019	BenchmarkingGPU	CodeCode Available	2	5
SemViQA: A Semantic Question Answering System for Vietnamese Information Fact-Checking	Mar 2, 2025	Fact CheckingFact Verification	CodeCode Available	2	5
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension	Mar 12, 2024	DeblurringDecoder	CodeCode Available	2	5
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
Benchmarking Retrieval-Augmented Generation in Multi-Modal Contexts	Feb 24, 2025	BenchmarkingFact Verification	CodeCode Available	2	5
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis	Mar 29, 2024	HallucinationImage Captioning	CodeCode Available	2	5
How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions	Jul 6, 2024	Question AnsweringRAG	CodeCode Available	2	5
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions	Dec 20, 2022	HallucinationQuestion Answering	CodeCode Available	2	5
BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models	Apr 17, 2021	Argument RetrievalBenchmarking	CodeCode Available	2	5
GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks	Feb 11, 2024	Graph Question AnsweringInstruction Following	CodeCode Available	2	5
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling	Jul 12, 2024	AllLanguage Modeling	CodeCode Available	2	5

Show:10 25 50

← PrevPage 9 of 433Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified