SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 15261550 of 2177 papers

TitleStatusHype
Reducing Language Biases in Visual Question Answering with Visually-Grounded Question Encoder0
CLIP Models are Few-shot Learners: Empirical Studies on VQA and Visual Entailment0
Visual question answering: from early developments to recent advances -- a survey0
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering0
Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective0
CLEVR-POC: Reasoning-Intensive Visual Question Answering in Partially Observable Environments0
ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding0
Visual Question Answering in Remote Sensing with Cross-Attention and Multimodal Information Bottleneck0
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment0
CL-CrossVQA: A Continual Learning Benchmark for Cross-Domain Visual Question Answering0
Claude 3.5 Sonnet Model Card Addendum0
Rephrasing visual questions by specifying the entropy of the answer distribution0
Representation, Learning and Reasoning on Spatial Language for Downstream NLP Tasks0
Representing Movie Characters in Dialogues0
Reproducibility Report for "Learning To Count Objects In Natural Images For Visual Question Answering"0
RepsNet: Combining Vision with Language for Automated Medical Reports0
RescueADI: Adaptive Disaster Interpretation in Remote Sensing Images with Autonomous Agents0
Visual Question Answering Instruction: Unlocking Multimodal Large Language Model To Domain-Specific Visual Multitasks0
CLAMP: Contrastive LAnguage Model Prompt-tuning0
Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization0
Rethinking Visual Prompting for Multimodal Large Language Models with External Knowledge0
VrR-VG: Refocusing Visually-Relevant Relationships0
Retrieval-Augmented Natural Language Reasoning for Explainable Visual Question Answering0
CIC: A Framework for Culturally-Aware Image Captioning0
Retrieval-Augmented Visual Question Answering via Built-in Autoregressive Search Engines0
Show:102550
← PrevPage 62 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified