SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 751775 of 2177 papers

TitleStatusHype
Core Tokensets for Data-efficient Sequential Training of TransformersCode0
Copy-Move Forgery Detection and Question Answering for Remote Sensing ImageCode0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answeringCode0
MUREL: Multimodal Relational Reasoning for Visual Question AnsweringCode0
Grad-CAM: Why did you say that?Code0
Convincing Rationales for Visual Question Answering ReasoningCode0
Counting Everyday Objects in Everyday ScenesCode0
Multimodal Residual Learning for Visual QACode0
Continual VQA for Disaster Response SystemsCode0
Context-VQA: Towards Context-Aware and Purposeful Visual Question AnsweringCode0
Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning ApproachCode0
Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMsCode0
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQACode0
OmniFusion Technical ReportCode0
Contextual Dropout: An Efficient Sample-Dependent Dropout ModuleCode0
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and BeyondCode0
Attribute Diversity Determines the Systematicity Gap in VQACode0
Consistency of Compositional Generalization across Multiple LevelsCode0
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question AnsweringCode0
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual GroundingCode0
Multimodal Explanations: Justifying Decisions and Pointing to the EvidenceCode0
Adaptive loose optimization for robust question answeringCode0
Multimodal Preference Data Synthetic Alignment with Reward ModelCode0
Attention on Attention: Architectures for Visual Question Answering (VQA)Code0
Show:102550
← PrevPage 31 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified