SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 881890 of 2177 papers

TitleStatusHype
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs0
Grounded Word Sense Translation0
Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray0
LCV2: An Efficient Pretraining-Free Framework for Grounded Visual Question Answering0
Integrating Object Detection Modality into Visual Language Model for Enhanced Autonomous Driving Agent0
Interactive Attention AI to translate low light photos to captions for night scene understanding in women safety0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
Counterfactual Vision and Language Learning0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models0
Analysis on Image Set Visual Question Answering0
Show:102550
← PrevPage 89 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified