SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 110 of 2177 papers

TitleStatusHype
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights0
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation0
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language ModelsCode0
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Show:102550
← PrevPage 1 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1Florenceoverall80.36Unverified
2OFAnumber71.44Unverified
3LXMERT (low-magnitude pruning)Accuracy70.87Unverified