SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 17011750 of 2177 papers

TitleStatusHype
Breaking Neural Network Scaling Laws with Modularity0
Spatial Attention as an Interface for Image Captioning Models0
Spatial Knowledge Distillation to aid Visual Reasoning0
SpatialReasoner: Towards Explicit and Generalizable 3D Spatial Reasoning0
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities0
Advancing Surgical VQA with Scene Graph Knowledge0
Breaking Down Questions for Outside-Knowledge Visual Question Answering0
Breaking Down Questions for Outside-Knowledge VQA0
SplatTalk: 3D VQA with Gaussian Splatting0
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images0
Boosting Cross-task Transferability of Adversarial Patches with Visual Relations0
Stacked Latent Attention for Multimodal Reasoning0
Stacking with Auxiliary Features for Visual Question Answering0
StackOverflowVQA: Stack Overflow Visual Question Answering Dataset0
Steering LVLMs via Sparse Autoencoder for Hallucination Mitigation0
BOK-VQA: Bilingual outside Knowledge-Based Visual Question Answering via Graph Representation Pretraining0
Blocks as Probes: Dissecting Categorization Ability of Large Multimodal Models0
Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges0
Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering0
Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization0
StructuralLM: Structural Pre-training for Form Understanding0
Structure Causal Models and LLMs Integration in Medical Visual Question Answering0
Advancing Multimodal Medical Capabilities of Gemini0
xGQA: Cross-Lingual Visual Question Answering0
Structured Two-stream Attention Network for Video Question Answering0
Structure Guided Multi-modal Pre-trained Transformer for Knowledge Graph Reasoning0
Structure Learning for Neural Module Networks0
Sunny and Dark Outside?! Improving Answer Consistency in VQA through Entailed Question Generation0
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions0
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision0
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks0
Beyond the Hype: A dispassionate look at vision-language models in medical scenario0
Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models0
Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery0
SurgicalVLM-Agent: Towards an Interactive AI Co-Pilot for Pituitary Surgery0
Beyond the Frame: Generating 360° Panoramic Videos from Perspective Videos0
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs0
Survey of Large Multimodal Model Datasets, Application Categories and Taxonomy0
Survey of Recent Advances in Visual Question Answering0
Survey of Visual Question Answering: Datasets and Techniques0
Survey of Visual-Semantic Embedding Methods for Zero-Shot Image Retrieval0
SViQA: A Unified Speech-Vision Multimodal Model for Textless Visual Question Answering0
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis0
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework0
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning0
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input0
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment0
BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering0
Syntax Tree Constrained Graph Network for Visual Question Answering0
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA0
Show:102550
← PrevPage 35 of 44Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified