SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 876900 of 2177 papers

TitleStatusHype
Examining Gender and Racial Bias in Large Vision-Language Models Using a Novel Dataset of Parallel ImagesCode0
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language ModelsCode7
Convincing Rationales for Visual Question Answering ReasoningCode0
Text-Guided Image ClusteringCode1
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional TokenizationCode4
GeReA: Question-Aware Prompt Captions for Knowledge-based Visual Question AnsweringCode2
Knowledge Generation for Zero-shot Knowledge-based VQACode0
Instruction Makes a DifferenceCode0
Can Generative AI Support Patients' & Caregivers' Informational Needs? Towards Task-Centric Evaluation Of AI Systems0
From Training-Free to Adaptive: Empirical Insights into MLLMs' Understanding of Detection Information0
MouSi: Poly-Visual-Expert Vision-Language ModelsCode2
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model0
Muffin or Chihuahua? Challenging Multimodal Large Language Models with Multipanel VQA0
MoE-LLaVA: Mixture of Experts for Large Vision-Language ModelsCode7
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering0
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning0
Free Form Medical Visual Question Answering in Radiology0
Small Language Model Meets with Reinforced Vision Vocabulary0
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities0
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World KnowledgeCode1
Veagle: Advancements in Multimodal Representation LearningCode1
Question-Answer Cross Language Image Matching for Weakly Supervised Semantic SegmentationCode1
COCO is "ALL'' You Need for Visual Instruction Fine-tuning0
Uncovering the Full Potential of Visual Grounding Methods in VQACode0
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining0
Show:102550
← PrevPage 36 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified