SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 191200 of 2177 papers

TitleStatusHype
JourneyDB: A Benchmark for Generative Image UnderstandingCode2
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal PerceptionCode2
GMAI-VL & GMAI-VL-5.5M: A Large Vision-Language Model and A Comprehensive Multimodal Dataset Towards General Medical AICode2
EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysisCode2
Imp: Highly Capable Large Multimodal Models for Mobile DevicesCode2
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction TuningCode2
Keeping Yourself is Important in Downstream Tuning Multimodal Large Language ModelCode2
Are Language Models Puzzle Prodigies? Algorithmic Puzzles Unveil Serious Challenges in Multimodal ReasoningCode2
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration RateCode2
Grounding-IQA: Multimodal Language Grounding Model for Image Quality AssessmentCode2
Show:102550
← PrevPage 20 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified