SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 681690 of 2177 papers

TitleStatusHype
Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation0
Visual Question Answering on Multiple Remote Sensing Image Modalities0
SNAP: A Benchmark for Testing the Effects of Capture Conditions on Fundamental Vision TasksCode0
Discovering Pathology Rationale and Token Allocation for Efficient Multimodal Pathology Reasoning0
TinyDrive: Multiscale Visual Question Answering with Selective Token Routing for Autonomous Driving0
Robo2VLM: Visual Question Answering from Large-Scale In-the-Wild Robot Manipulation Datasets0
Traveling Across Languages: Benchmarking Cross-Lingual Consistency in Multimodal LLMsCode0
Human-centered Interactive Learning via MLLMs for Text-to-Image Person Re-identification0
TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language ModelsCode0
Domain Adaptation of VLM for Soccer Video Understanding0
Show:102550
← PrevPage 69 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified