SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 110 of 2177 papers

TitleStatusHype
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights0
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation0
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language ModelsCode0
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Show:102550
← PrevPage 1 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1LLaVA-InternLM2-ViT + MoSLoRAGPT-3.5 score73.8Unverified
2CuMo-7BGPT-3.5 score73Unverified
3LLaVA-LLaMA3-8B-ViT + MoSLoRAGPT-3.5 score73Unverified
4Video-LaVITGPT-3.5 score67.3Unverified
5DreamLLM-7BGPT-3.5 score49.9Unverified