Visual Question Answering

MLLM Leaderboard

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 501–525 of 2177 papers

Title	Date	Tasks	Status	Hype
Emu3: Next-Token Prediction is All You Need	Sep 27, 2024	All	CodeCode Available	3
Charting the Future: Using Chart Question-Answering for Scalable Evaluation of LLM-Driven Data Visualizations	Sep 27, 2024	Chart Question AnsweringQuestion Answering	—Unverified	0
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE	Sep 26, 2024	image-classificationImage Classification	CodeCode Available	1
Robotic Environmental State Recognition with Pre-Trained Vision-Language Models and Black-Box Optimization	Sep 26, 2024	Image to textImage-to-Text Retrieval	—Unverified	0
DARE: Diverse Visual Question Answering with Robustness Evaluation	Sep 26, 2024	image-classificationImage Classification	—Unverified	0
ZALM3: Zero-Shot Enhancement of Vision-Language Alignment via In-Context Information in Multi-Turn Multimodal Medical Dialogue	Sep 26, 2024	Medical Visual Question AnsweringQuestion Answering	—Unverified	0
A Unified Hallucination Mitigation Framework for Large Vision-Language Models	Sep 24, 2024	HallucinationQuestion Answering	CodeCode Available	0
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models	Sep 23, 2024	Medical Visual Question AnsweringQuestion Answering	CodeCode Available	1
Phantom of Latent for Large Language and Vision Models	Sep 23, 2024	Visual Question Answering	CodeCode Available	2
Detect, Describe, Discriminate: Moving Beyond VQA for MLLM Evaluation	Sep 23, 2024	Multiple-choiceQuestion Answering	—Unverified	0
Can CLIP Count Stars? An Empirical Study on Quantity Bias in CLIP	Sep 23, 2024	Image GenerationQuestion Answering	—Unverified	0
@Bench: Benchmarking Vision-Language Models for Human-centered Assistive Technology	Sep 21, 2024	BenchmarkingDepth Estimation	—Unverified	0
Evaluating Image Hallucination in Text-to-Image Generation with Question-Answering	Sep 19, 2024	HallucinationHallucination Evaluation	CodeCode Available	1
Vision Language Models Can Parse Floor Plan Maps	Sep 19, 2024	Image CaptioningQuestion Answering	—Unverified	0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution	Sep 18, 2024	Natural Language Visual Grounding	CodeCode Available	11
Sparks of Artificial General Intelligence(AGI) in Semiconductor Material Science: Early Explorations into the Next Frontier of Generative AI-Assisted Electron Micrograph Analysis	Sep 17, 2024	In-Context LearningQuestion Answering	—Unverified	0
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs	Sep 17, 2024	Question AnsweringToken Reduction	CodeCode Available	1
OneEncoder: A Lightweight Framework for Progressive Alignment of Modalities	Sep 17, 2024	cross-modal alignmentQuestion Answering	—Unverified	0
CAST: Cross-modal Alignment Similarity Test for Vision Language Models	Sep 17, 2024	cross-modal alignmentQuestion Answering	CodeCode Available	0
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training	Sep 15, 2024	Contrastive Learningcross-modal alignment	—Unverified	0
Explore the Hallucination on Low-level Perception for MLLMs	Sep 15, 2024	HallucinationQuestion Answering	—Unverified	0
One missing piece in Vision and Language: A Survey on Comics Understanding	Sep 14, 2024	document understandingimage-classification	CodeCode Available	2
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types	Sep 14, 2024	Language ModelingLanguage Modelling	CodeCode Available	0
Learning to Compress Contexts for Efficient Knowledge-based Visual Question Answering	Sep 11, 2024	Question AnsweringVisual Question Answering	—Unverified	0
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks	Sep 11, 2024	Image CaptioningQuestion Answering	CodeCode Available	0

Show:10 25 50

← PrevPage 21 of 88Next →

All datasets MM-Vet ViP-Bench VQA v2 test-dev BenchLMM MMBench V*bench VQA v2 val MSRVTT-QA VQA v2 test-std MMHal-Bench MSVD-QA PlotQA-D1

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	MMCTAgent (GPT-4 + GPT-4V)	GPT-4 score	74.24	—	Unverified
2	Qwen2-VL-72B	GPT-4 score	74	—	Unverified
3	InternVL2.5-78B	GPT-4 score	72.3	—	Unverified
4	GPT-4o +text rationale +IoT	GPT-4 score	72.2	—	Unverified
5	Lyra-Pro	GPT-4 score	71.4	—	Unverified
6	GLM-4V-Plus	GPT-4 score	71.1	—	Unverified
7	Phantom-7B	GPT-4 score	70.8	—	Unverified
8	InternVL2.5-38B	GPT-4 score	68.8	—	Unverified
9	InternVL2-26B (SGP, token ratio 64%)	GPT-4 score	65.6	—	Unverified
10	Baichuan-Omni (7B)	GPT-4 score	65.4	—	Unverified