SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 10511075 of 2177 papers

TitleStatusHype
Toloka Visual Question Answering BenchmarkCode1
Tackling VQA with Pretrained Foundation Models without Further Training0
Sentence Attention Blocks for Answer Grounding0
DreamLLM: Synergistic Multimodal Comprehension and CreationCode2
KOSMOS-2.5: A Multimodal Literate Model0
Visual Question Answering in the Medical Domain0
An Empirical Study of Scaling Instruct-Tuned Large Multimodal ModelsCode6
Syntax Tree Constrained Graph Network for Visual Question Answering0
D3: Data Diversity Design for Systematic Generalization in Visual Question AnsweringCode0
TextBind: Multi-turn Interleaved Multimodal Instruction-following in the WildCode1
Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning0
Interpretable Visual Question Answering via Reasoning Supervision0
Evaluation and Enhancement of Semantic Grounding in Large Vision-Language Models0
A Survey on Interpretable Cross-modal ReasoningCode1
Physically Grounded Vision-Language Models for Robotic Manipulation0
Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic GroundingCode0
Separate and Locate: Rethink the Text in Text-based Visual Question AnsweringCode0
Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception0
UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and MemoryCode1
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIPCode1
DLIP: Distilling Language-Image Pre-training0
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4Code1
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE0
VQA Therapy: Exploring Answer Differences by Visually Grounding AnswersCode0
Show:102550
← PrevPage 43 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified