SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 876900 of 2177 papers

TitleStatusHype
Co-VQA : Answering by Interactive Sub Question Sequence0
Instruction-augmented Multimodal Alignment for Image-Text and Element Matching0
Grounding Complex Navigational Instructions Using Scene Graphs0
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports0
Co-VQA : Answering by Interactive Sub Question Sequence0
Learning Sparsity for Effective and Efficient Music Performance Question Answering0
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models0
Grounding Answers for Visual Questions Asked by Visually Impaired People0
CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding0
Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent0
Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety0
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs0
Grounded Word Sense Translation0
Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray0
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model0
Learning Rich Image Region Representation for Visual Question Answering0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
Counterfactual Vision and Language Learning0
Interpretable Counting for Visual Question Answering0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models0
Analysis on Image Set Visual Question Answering0
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback0
Graph-Structured Representations for Visual Question Answering0
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture0
Bilinear Graph Networks for Visual Question Answering0
Show:102550
← PrevPage 36 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified