Question Answering

Question answering can be segmented into domain-specific tasks like community question answering and knowledge-base question answering. Popular benchmark datasets for evaluation question answering systems include SQuAD, HotPotQA, bAbI, TriviaQA, WikiQA, and many others. Models for question answering are typically evaluated on metrics like EM and F1. Some recent top performing models are T5 and XLNet.

( Image credit: SQuAD )

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–550 of 10817 papers

Title	Date	Tasks	Status	Hype
GIT: A Generative Image-to-text Transformer for Vision and Language	May 27, 2022	DecoderImage Captioning	CodeCode Available	2
GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI	Aug 6, 2024	Question AnsweringVisual Question Answering	CodeCode Available	2
GeoChat: Grounded Large Vision-Language Model for Remote Sensing	Nov 24, 2023	Instruction FollowingLanguage Modeling	CodeCode Available	2
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question Answering	Feb 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AI	Nov 21, 2024	Decision MakingLanguage Modeling	CodeCode Available	2
Generate-on-Graph: Treat LLM as both Agent and KG in Incomplete Knowledge Graph Question Answering	Apr 23, 2024	Graph Question AnsweringHallucination	CodeCode Available	2
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities	Jun 17, 2024	Audio Question AnsweringInstruction Following	CodeCode Available	2
From Redundancy to Relevance: Information Flow in LVLMs Across Reasoning Tasks	Jun 4, 2024	Image CaptioningLanguage Modelling	CodeCode Available	2
Frozen Transformers in Language Models Are Effective Visual Encoder Layers	Oct 19, 2023	Action RecognitionImage-text Retrieval	CodeCode Available	2
A Multimodal Benchmark Dataset and Model for Crop Disease Diagnosis	Mar 10, 2025	Question Answering	CodeCode Available	2
A Survey on Benchmarks of Multimodal Large Language Models	Aug 16, 2024	Question AnsweringSurvey	CodeCode Available	2
VHM: Versatile and Honest Vision Language Model for Remote Sensing Image Analysis	Mar 29, 2024	HallucinationImage Captioning	CodeCode Available	2
FreeVA: Offline MLLM as Training-Free Video Assistant	May 13, 2024	FairnessQuestion Answering	CodeCode Available	2
F-LMM: Grounding Frozen Large Multimodal Models	Jun 9, 2024	General KnowledgeInstruction Following	CodeCode Available	2
Free Video-LLM: Prompt-guided Visual Perception for Efficient Training-free Video LLMs	Oct 14, 2024	Computational EfficiencyQuestion Answering	CodeCode Available	2
FlagEvalMM: A Flexible Framework for Comprehensive Multimodal Model Evaluation	Jun 10, 2025	Image-text RetrievalQuestion Answering	CodeCode Available	2
Fine-grained Late-interaction Multi-modal Retrieval for Retrieval Augmented Visual Question Answering	Sep 29, 2023	Image to textPassage Retrieval	CodeCode Available	2
Ask Me Anything: A simple strategy for prompting language models	Oct 5, 2022	Coreference ResolutionNatural Language Inference	CodeCode Available	2
From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models	Oct 13, 2023	HallucinationImage Captioning	CodeCode Available	2
Atlas: Few-shot Learning with Retrieval Augmented Language Models	Aug 5, 2022	Fact CheckingFew-Shot Learning	CodeCode Available	2
FanOutQA: A Multi-Hop, Multi-Document Question Answering Benchmark for Large Language Models	Feb 21, 2024	Question Answering	CodeCode Available	2
FinBERT-QA: Financial Question Answering with pre-trained BERT Language Models	Apr 24, 2025	Answer SelectionInformation Retrieval	CodeCode Available	2
Evaluating RAG-Fusion with RAGElo: an Automated Elo-based Framework	Jun 20, 2024	HallucinationQuestion Answering	CodeCode Available	2
Explore the Limits of Omni-modal Pretraining at Scale	Jun 13, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
A Replication Study of Dense Passage Retriever	Apr 12, 2021	Open-Domain Question AnsweringQuestion Answering	CodeCode Available	2
Evaluating LLM Reasoning in the Operations Research Domain with ORQA	Dec 22, 2024	Question Answering	CodeCode Available	2
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer	Oct 23, 2019	Answer GenerationCommon Sense Reasoning	CodeCode Available	2
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training	Jun 2, 2023	Language ModelingLanguage Modelling	CodeCode Available	2
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM	Mar 6, 2025	Anomaly DetectionLanguage Modeling	CodeCode Available	2
ERA-CoT: Improving Chain-of-Thought through Entity Relationship Analysis	Mar 11, 2024	Question Answering	CodeCode Available	2
EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis	Sep 10, 2024	Contrastive LearningCross-Modal Retrieval	CodeCode Available	2
FakeBench: Probing Explainable Fake Image Detection via Large Multimodal Models	Apr 20, 2024	Binary ClassificationFake Image Detection	CodeCode Available	2
End-to-End Navigation with Vision Language Models: Transforming Spatial Reasoning into Question-Answering	Nov 8, 2024	Language ModelingLanguage Modelling	CodeCode Available	2
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal Reasoning	Mar 6, 2024	Multimodal ReasoningQuestion Answering	CodeCode Available	2
End-To-End Memory Networks	Mar 31, 2015	Language ModelingLanguage Modelling	CodeCode Available	2
Enhancing Visual-Language Modality Alignment in Large Vision Language Models via Self-Improvement	May 24, 2024	HallucinationImage Comprehension	CodeCode Available	2
AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents	Jul 5, 2024	Decision MakingMulti-hop Question Answering	CodeCode Available	2
FinMem: A Performance-Enhanced LLM Trading Agent with Layered Memory and Character Design	Nov 23, 2023	Decision MakingLanguage Modelling	CodeCode Available	2
A Simple Aerial Detection Baseline of Multimodal Language Models	Jan 16, 2025	object-detectionObject Detection	CodeCode Available	2
EHRAgent: Code Empowers Large Language Models for Few-shot Complex Tabular Reasoning on Electronic Health Records	Jan 13, 2024	Code GenerationFew-Shot Learning	CodeCode Available	2
FortisAVQA and MAVEN: a Benchmark Dataset and Debiasing Framework for Robust Multimodal Reasoning	Apr 1, 2025	Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA)	CodeCode Available	2
FrameFusion: Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models	Dec 30, 2024	Question AnsweringToken Reduction	CodeCode Available	2
EmbodiedEval: Evaluate Multimodal LLMs as Embodied Agents	Jan 21, 2025	AttributeQuestion Answering	CodeCode Available	2
A Pilot Study for Chinese SQL Semantic Parsing	Sep 29, 2019	Cross-Lingual Word EmbeddingsQuestion Answering	CodeCode Available	2
Egocentric Video-Language Pretraining	Jun 3, 2022	Action RecognitionContrastive Learning	CodeCode Available	2
Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning	May 27, 2024	Question AnsweringRAG	CodeCode Available	2
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding	Sep 26, 2024	Question AnsweringVideo Understanding	CodeCode Available	2
EduChat: A Large-Scale Language Model-based Chatbot System for Intelligent Education	Aug 5, 2023	ChatbotLanguage Modeling	CodeCode Available	2
Efficient One-Pass End-to-End Entity Linking for Questions	Oct 6, 2020	CPUEntity Linking	CodeCode Available	2
Dual Diffusion for Unified Image Generation and Understanding	Dec 31, 2024	Image GenerationLanguage Modeling	CodeCode Available	2

Show:10 25 50

← PrevPage 11 of 217Next →

All datasets SQuAD2.0 SQuAD1.1 HotpotQA PIQA BoolQ COPA TriviaQA SQuAD1.1 dev Natural Questions OpenBookQA TruthfulQA MultiRC

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IE-Net (ensemble)	EM	90.94	—	Unverified
2	FPNet (ensemble)	EM	90.87	—	Unverified
3	IE-NetV2 (ensemble)	EM	90.86	—	Unverified
4	SA-Net on Albert (ensemble)	EM	90.72	—	Unverified
5	SA-Net-V2 (ensemble)	EM	90.68	—	Unverified
6	FPNet (ensemble)	EM	90.6	—	Unverified
7	Retro-Reader (ensemble)	EM	90.58	—	Unverified
8	EntitySpanFocusV2 (ensemble)	EM	90.52	—	Unverified
9	TransNets + SFVerifier + SFEnsembler (ensemble)	EM	90.49	—	Unverified
10	EntitySpanFocus+AT (ensemble)	EM	90.45	—	Unverified