SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 751800 of 2177 papers

TitleStatusHype
Counting Everyday Objects in Everyday ScenesCode0
Grounding Answers for Visual Questions Asked by Visually Impaired PeopleCode0
Object-aware Adaptive-Positivity Learning for Audio-Visual Question AnsweringCode0
OG-SGG: Ontology-Guided Scene Graph Generation. A Case Study in Transfer Learning for Telepresence RoboticsCode0
OmniFusion Technical ReportCode0
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves?Code0
Adaptively Clustering Neighbor Elements for Image-Text GenerationCode0
Object Attribute Matters in Visual Question AnsweringCode0
No Images, No Problem: Retaining Knowledge in Continual VQA with Questions-Only MemoryCode0
Noise Estimation Using Density Estimation for Self-Supervised Multimodal LearningCode0
A Unified Hallucination Mitigation Framework for Large Vision-Language ModelsCode0
Core Tokensets for Data-efficient Sequential Training of TransformersCode0
Neural-Symbolic VQA: Disentangling Reasoning from Vision and Language UnderstandingCode0
Automatic Generation of Contrast Sets from Scene Graphs: Probing the Compositional Consistency of GQACode0
Copy-Move Forgery Detection and Question Answering for Remote Sensing ImageCode0
NeSyCoCo: A Neuro-Symbolic Concept Composer for Compositional GeneralizationCode0
Neural Module NetworksCode0
Grad-CAM: Why did you say that?Code0
Convincing Rationales for Visual Question Answering ReasoningCode0
NAAQA: A Neural Architecture for Acoustic Question AnsweringCode0
Continual VQA for Disaster Response SystemsCode0
Context-VQA: Towards Context-Aware and Purposeful Visual Question AnsweringCode0
Augmenting Visual Question Answering with Semantic Frame Information in a Multitask Learning ApproachCode0
Open-Set Knowledge-Based Visual Question Answering with Inference PathsCode0
Contextual Dropout: An Efficient Sample-Dependent Dropout ModuleCode0
Attribute Diversity Determines the Systematicity Gap in VQACode0
Consistency of Compositional Generalization across Multiple LevelsCode0
Multi-Sourced Compositional Generalization in Visual Question AnsweringCode0
MUREL: Multimodal Relational Reasoning for Visual Question AnsweringCode0
Multi-Page Document Visual Question Answering using Self-Attention Scoring MechanismCode0
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answeringCode0
Hierarchical Deep Multi-modal Network for Medical Visual Question AnsweringCode0
Adaptive loose optimization for robust question answeringCode0
Multimodal Residual Learning for Visual QACode0
Music's Multimodal Complexity in AVQA: Why We Need More than General Multimodal LLMsCode0
Attention on Attention: Architectures for Visual Question Answering (VQA)Code0
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question AnsweringCode0
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and BeyondCode0
Adapting Lightweight Vision Language Models for Radiological Visual Question AnsweringCode0
Multimodal Preference Data Synthetic Alignment with Reward ModelCode0
Compositionality as Lexical SymmetryCode0
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual GroundingCode0
Compositional Image-Text Matching and Retrieval by Grounding EntitiesCode0
Multimodal Explanations: Justifying Decisions and Pointing to the EvidenceCode0
Generalizing Visual Question Answering from Synthetic to Human-Written Questions via a Chain of QA with a Large Language ModelCode0
Targeted Visual Prompting for Medical Visual Question AnsweringCode0
Language Models Meet Anomaly Detection for Better Interpretability and GeneralizabilityCode0
Multi-modal Factorized Bilinear Pooling with Co-Attention Learning for Visual Question AnsweringCode0
MUTAN: Multimodal Tucker Fusion for Visual Question AnsweringCode0
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in VietnameseCode0
Show:102550
← PrevPage 16 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified