SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 10511100 of 2177 papers

TitleStatusHype
Fusion of Domain-Adapted Vision and Language Models for Medical Visual Question Answering0
FVQA 2.0: Introducing Adversarial Samples into Fact-based Visual Question Answering0
FVQA: Fact-based Visual Question Answering0
Gamified crowd-sourcing of high-quality data for visual fine-tuning0
GC-KBVQA: A New Four-Stage Framework for Enhancing Knowledge Based Visual Question Answering Performance0
GEMeX: A Large-Scale, Groundable, and Explainable Medical VQA Benchmark for Chest X-ray Diagnosis0
GEMeX-ThinkVG: Towards Thinking with Visual Grounding in Medical VQA via Reinforcement Learning0
Gemini Pro Defeated by GPT-4V: Evidence from Education0
Gender and Racial Bias in Visual Question Answering Datasets0
Generalization Differences between End-to-End and Neuro-Symbolic Vision-Language Reasoning Systems0
Generalized Hadamard-Product Fusion Operators for Visual Question Answering0
Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge0
Generating Natural Language Explanations for Visual Question Answering using Scene Graphs and Visual Attention0
Generating Natural Questions from Images for Multimodal Assistants0
Generating Rationales in Visual Question Answering0
Generating Triples with Adversarial Networks for Scene Graph Construction0
Generative Visual Question Answering0
Generic Attention-model Explainability by Weighted Relevance Accumulation0
GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing0
GeoRSMLLM: A Multimodal Large Language Model for Vision-Language Tasks in Geoscience and Remote Sensing0
GiVE: Guiding Visual Encoder to Perceive Overlooked Information0
γ-MoD: Exploring Mixture-of-Depth Adaptation for Multimodal Large Language Models0
Goal-Oriented Semantic Communication for Wireless Visual Question Answering0
Good, Better, Best: Textual Distractors Generation for Multiple-Choice Visual Question Answering via Reinforcement Learning0
GPT-4V Explorations: Mining Autonomous Driving0
GRADE: Quantifying Sample Diversity in Text-to-Image Models0
GRAM: Global Reasoning for Multi-Page VQA0
Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network0
Graph Neural Networks in Vision-Language Image Understanding: A Survey0
Bilinear Graph Networks for Visual Question Answering0
Graph Relation Transformer: Incorporating pairwise object features into the Transformer architecture0
Graph-Structured Representations for Visual Question Answering0
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback0
Griffon-G: Bridging Vision-Language and Vision-Centric Tasks via Large Multimodal Models0
GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions0
Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray0
Grounded Word Sense Translation0
Grounding Answers for Visual Questions Asked by Visually Impaired People0
Grounding Chest X-Ray Visual Question Answering with Generated Radiology Reports0
Grounding Complex Navigational Instructions Using Scene Graphs0
Grounding Task Assistance with Multimodal Cues from a Single Demonstration0
Guiding Visual Question Answering with Attention Priors0
H2OVL-Mississippi Vision Language Models Technical Report0
Hadamard product in deep learning: Introduction, Advances and Challenges0
Hallucination at a Glance: Controlled Visual Edits and Fine-Grained Multimodal Learning0
HAMMR: HierArchical MultiModal React agents for generic VQA0
Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation0
Hierarchical Graph Attention Network for Few-Shot Visual-Semantic Learning0
Hierarchical Modeling for Medical Visual Question Answering with Cross-Attention Fusion0
HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision0
Show:102550
← PrevPage 22 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified