SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 391400 of 2177 papers

TitleStatusHype
CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual ReasoningCode1
An Approach to Solving the Abstraction and Reasoning Corpus (ARC) ChallengeCode1
Comprehensive Visual Question Answering on Point Clouds through Compositional Scene ManipulationCode1
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question AnsweringCode1
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray ImagesCode1
Cross-modal Information Flow in Multimodal Large Language ModelsCode1
Learning to Contrast the Counterfactual Samples for Robust Visual Question AnsweringCode1
BackdoorMBTI: A Backdoor Learning Multimodal Benchmark Tool Kit for Backdoor Defense EvaluationCode1
Learning to Answer Visual Questions from Web VideosCode1
Learning to Discretely Compose Reasoning Module Networks for Video CaptioningCode1
Show:102550
← PrevPage 40 of 218Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified