SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 626650 of 2177 papers

TitleStatusHype
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks0
DUBLIN -- Document Understanding By Language-Image Network0
BuDDIE: A Business Document Dataset for Multi-task Information Extraction0
How Much Can CLIP Benefit Vision-and-Language Tasks?0
Adversarial Representation Learning for Text-to-Image Matching0
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making0
Ontology-based knowledge representation for bone disease diagnosis: a foundation for safe and sustainable medical artificial intelligence systems0
DualNet: Domain-Invariant Network for Visual Question Answering0
Bridging the Semantic Gaps: Improving Medical VQA Consistency with LLM-Augmented Question Sets0
Dual Capsule Attention Mask Network with Mutual Learning for Visual Question Answering0
Bridge Damage Cause Estimation Using Multiple Images Based on Visual Question Answering0
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites0
Breaking Neural Network Scaling Laws with Modularity0
DreamSync: Aligning Text-to-Image Generation with Image Understanding Feedback0
Breaking Down Questions for Outside-Knowledge Visual Question Answering0
Answer-Type Prediction for Visual Question Answering0
How good are deep models in understanding the generated images?0
How to Design Sample and Computationally Efficient VQA Models0
Breaking Down Questions for Outside-Knowledge VQA0
Double Visual Defense: Adversarial Pre-training and Instruction Tuning for Improving Vision-Language Model Robustness0
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images0
Adversarial Multimodal Network for Movie Question Answering0
Domain-robust VQA with diverse datasets and methods but no target labels0
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion0
Domain Adaptation of VLM for Soccer Video Understanding0
Show:102550
← PrevPage 26 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified