SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 110 of 2177 papers

TitleStatusHype
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights0
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation0
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language ModelsCode0
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Show:102550
← PrevPage 1 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4VGPT-3.5 score58.37Unverified
2Sphinx-V2-1KGPT-3.5 score57.43Unverified
3LLaVA-1.5-13BGPT-3.5 score55.53Unverified
4LLaVA-1.5-7BGPT-3.5 score46.83Unverified
5InstructBLIP-13BGPT-3.5 score45.03Unverified
6InstructBLIP-7BGPT-3.5 score44.63Unverified
7LLaVA-1-13BGPT-3.5 score43.5Unverified
8Otter-7BGPT-3.5 score39.13Unverified
9MiniGPT4-13BGPT-3.5 score34.93Unverified
10MiniGPTv2-7BGPT-3.5 score30.1Unverified