SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 110 of 2177 papers

TitleStatusHype
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights0
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation0
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language ModelsCode0
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Show:102550
← PrevPage 1 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1CoCaAccuracy82.3Unverified
2BLIP-2 ViT-G OPT 6.7B (fine-tuned)Accuracy82.3Unverified
3OFAAccuracy82Unverified
4BLIP-2 ViT-G OPT 2.7B (fine-tuned)Accuracy81.74Unverified
5BLIP-2 ViT-G FlanT5 XL (fine-tuned)Accuracy81.66Unverified
6mPLUG-2Accuracy81.11Unverified
7FlorenceAccuracy80.16Unverified
8Aurora (ours, r=64)Accuracy77.69Unverified
9VK-OODAccuracy76.8Unverified
10LXMERT (low-magnitude pruning)Accuracy70.72Unverified