SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 10011025 of 2177 papers

TitleStatusHype
Measuring Faithful and Plausible Visual Grounding in VQACode0
Dual Attention Networks for Visual Reference Resolution in Visual DialogCode0
End-to-End Instance Segmentation with Recurrent AttentionCode0
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video FeaturesCode0
MapEval: A Map-Based Evaluation of Geo-Spatial Reasoning in Foundation ModelsCode0
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingCode0
MedHallTune: An Instruction-Tuning Benchmark for Mitigating Medical Hallucination in Vision-Language ModelsCode0
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question AnsweringCode0
LXMERT Model Compression for Visual Question AnsweringCode0
Learning to Count Objects in Natural Images for Visual Question AnsweringCode0
Dual Recurrent Attention Units for Visual Question AnsweringCode0
LPF: A Language-Prior Feedback Objective Function for De-biased Visual Question AnsweringCode0
Learning to Follow Object-Centric Image Editing Instructions FaithfullyCode0
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMsCode0
MaMMUT: A Simple Architecture for Joint Learning for MultiModal TasksCode0
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance ViewCode0
DVQA: Understanding Data Visualizations via Question AnsweringCode0
Black-box Model Ensembling for Textual and Visual Question Answering via Information FusionCode0
Logical Implications for Visual Question Answering ConsistencyCode0
Lost in Space: Probing Fine-grained Spatial Understanding in Vision and Language ResamplersCode0
LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic SurgeryCode0
Effective Approaches to Batch Parallelization for Dynamic Neural Network ArchitecturesCode0
Locally Smoothed Neural NetworksCode0
LLaVA-OneVision: Easy Visual Task TransferCode0
A Question-Centric Model for Visual Question Answering in Medical ImagingCode0
Show:102550
← PrevPage 41 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified