SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 110 of 2177 papers

TitleStatusHype
Describe Anything Model for Visual Question Answering on Text-rich ImagesCode1
Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights0
MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning0
Evaluating Attribute Confusion in Fashion Text-to-Image Generation0
LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation0
Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language ModelsCode0
SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning0
DrishtiKon: Multi-Granular Visual Grounding for Text-Rich Document ImagesCode0
Show:102550
← PrevPage 1 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1GPT-4V-turbo-detail:high (Visual Prompt)GPT-4 score (bbox)60.7Unverified
2GPT-4V-turbo-detail:low (Visual Prompt)GPT-4 score (bbox)52.8Unverified
3LLaVA-NeXT-Inst-IT-Qwen2-7B (Visual PromptGPT-4 score (bbox)50.5Unverified
4ViP-LLaVA-13B (Visual Prompt)GPT-4 score (bbox)48.3Unverified
5LLaVA-1.5-13B (Coordinates)GPT-4 score (bbox)47.1Unverified
6Qwen-VL-Chat (Coordinates)GPT-4 score (bbox)45.3Unverified
7LLaVA-NeXT-Inst-IT-Vicuna-7B (Visual PromptGPT-4 score (bbox)45.1Unverified
8LLaVA-1.5-13B (Visual Prompt)GPT-4 score (bbox)41.8Unverified
9Qwen-VL-Chat (Visual Prompt)GPT-4 score (bbox)39.2Unverified
10InstructBLIP-13B (Visual Prompt)GPT-4 score (bbox)35.8Unverified