Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 251–300 of 10817 papers

Title	Date	Tasks	Status	Hype	Score
Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding -- A Survey	Feb 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
ActiveRAG: Autonomously Knowledge Assimilation and Accommodation through Retrieval-Augmented Agents	Feb 21, 2024	Active LearningPosition	CodeCode Available	2	5
Knowledge Representation Learning: A Quantitative Review	Dec 28, 2018	General ClassificationInformation Retrieval	CodeCode Available	2	5
MASKSEARCH: A Universal Pre-Training Framework to Enhance Agentic Search Capability	May 26, 2025	Multi-hop Question AnsweringQuestion Answering	CodeCode Available	2	5
ktrain: A Low-Code Library for Augmented Machine Learning	Apr 19, 2020	BIG-bench Machine LearningClassification	CodeCode Available	2	5
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions	Apr 27, 2023	Common Sense ReasoningCoreference Resolution	CodeCode Available	2	5
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning	Jun 17, 2024	Data AugmentationMathematical Reasoning	CodeCode Available	2	5
KG-Rank: Enhancing Large Language Models for Medical QA with Knowledge Graphs and Ranking Techniques	Mar 9, 2024	Knowledge GraphsLong Form Question Answering	CodeCode Available	2	5
KET-RAG: A Cost-Efficient Multi-Granular Indexing Framework for Graph-RAG	Feb 13, 2025	Knowledge GraphsLarge Language Model	CodeCode Available	2	5
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM	Mar 6, 2025	Anomaly DetectionLanguage Modeling	CodeCode Available	2	5
Breaking the Ceiling of the LLM Community by Treating Token Generation as a Classification for Ensembling	Jun 18, 2024	Arithmetic ReasoningLanguage Modeling	CodeCode Available	2	5
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language Model	Mar 6, 2025	General KnowledgeImage Captioning	CodeCode Available	2	5
Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning	Sep 19, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions	Aug 19, 2023	MMEOptical Character Recognition (OCR)	CodeCode Available	2	5
JourneyDB: A Benchmark for Generative Image Understanding	Jul 3, 2023	Image CaptioningImage Comprehension	CodeCode Available	2	5
Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions	Dec 20, 2022	HallucinationQuestion Answering	CodeCode Available	2	5
A Pilot Study for Chinese SQL Semantic Parsing	Sep 29, 2019	Cross-Lingual Word EmbeddingsQuestion Answering	CodeCode Available	2	5
ISR-DPO: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO	Jun 17, 2024	Language ModellingQuestion Answering	CodeCode Available	2	5
Knowledge Graph Prompting for Multi-Document Question Answering	Aug 22, 2023	graph constructionOpen-Domain Question Answering	CodeCode Available	2	5
Learning Dense Representations of Phrases at Scale	Dec 23, 2020	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	2	5
Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models	Jan 27, 2024	Medical Question AnsweringMultiple-choice	CodeCode Available	2	5
Improving Retrieval-Augmented Generation through Multi-Agent Reinforcement Learning	Jan 25, 2025	Answer GenerationMulti-agent Reinforcement Learning	CodeCode Available	2	5
Hyena Hierarchy: Towards Larger Convolutional Language Models	Feb 21, 2023	2k8k	CodeCode Available	2	5
BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains	Feb 15, 2024	Few-Shot LearningMedical Question Answering	CodeCode Available	2	5
IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages	Apr 25, 2024	Cross-Lingual Question AnsweringDiversity	CodeCode Available	2	5
Hungry Hungry Hippos: Towards Language Modeling with State Space Models	Dec 28, 2022	8kCoreference Resolution	CodeCode Available	2	5
Huatuo-26M, a Large-scale Chinese Medical QA Dataset	May 2, 2023	Language ModelingLanguage Modelling	CodeCode Available	2	5
Blended RAG: Improving RAG (Retriever-Augmented Generation) Accuracy with Semantic Search and Hybrid Query-Based Retrievers	Mar 22, 2024	Information Retrieval	CodeCode Available	2	5
How Much are Large Language Models Contaminated? A Comprehensive Survey and the LLMSanitize Library	Mar 31, 2024	Question Answering	CodeCode Available	2	5
Hybrid Transformer with Multi-level Fusion for Multimodal Knowledge Graph Completion	May 4, 2022	Information RetrievalKnowledge Graph Completion	CodeCode Available	2	5
BiomedGPT: A Generalist Vision-Language Foundation Model for Diverse Biomedical Tasks	May 26, 2023	Image CaptioningMedical Visual Question Answering	CodeCode Available	2	5
HMT: Hierarchical Memory Transformer for Long Context Language Processing	May 9, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
500xCompressor: Generalized Prompt Compression for Large Language Models	Aug 6, 2024	Language ModellingLarge Language Model	CodeCode Available	2	5
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis	Mar 29, 2024	HallucinationImage Captioning	CodeCode Available	2	5
Habitat: A Platform for Embodied AI Research	Apr 2, 2019	BenchmarkingGPU	CodeCode Available	2	5
BlendSQL: A Scalable Dialect for Unifying Hybrid Question Answering in Relational Algebra	Feb 27, 2024	Question Answering	CodeCode Available	2	5
Grounding-IQA: Multimodal Language Grounding Model for Image Quality Assessment	Nov 26, 2024	Image Quality AssessmentQuestion Answering	CodeCode Available	2	5
How do you know that? Teaching Generative Language Models to Reference Answers to Biomedical Questions	Jul 6, 2024	Question AnsweringRAG	CodeCode Available	2	5
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions	Jan 24, 2024	document understandingQuestion Answering	CodeCode Available	2	5
Learning to Filter Context for Retrieval-Augmented Generation	Nov 14, 2023	Extractive Question-AnsweringFact Verification	CodeCode Available	2	5
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension	Mar 12, 2024	DeblurringDecoder	CodeCode Available	2	5
3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment	Aug 8, 2023	3D Question Answering (3D-QA)Dense Captioning	CodeCode Available	2	5
GOFA: A Generative One-For-All Model for Joint Graph Language Modeling	Jul 12, 2024	AllLanguage Modeling	CodeCode Available	2	5
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM	Mar 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI	Aug 6, 2024	Question AnsweringVisual Question Answering	CodeCode Available	2	5
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2	5
GeoChat: Grounded Large Vision-Language Model for Remote Sensing	Nov 24, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	2	5
GIT: A Generative Image-to-text Transformer for Vision and Language	May 27, 2022	DecoderImage Captioning	CodeCode Available	2	5
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI	Nov 21, 2024	Decision MakingLanguage Modeling	CodeCode Available	2	5
GraphTranslator: Aligning Graph Model to Large Language Model for Open-ended Tasks	Feb 11, 2024	Graph Question AnsweringInstruction Following	CodeCode Available	2	5

Show:10 25 50

← PrevPage 6 of 217Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified