SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 526550 of 2177 papers

TitleStatusHype
Greedy Gradient Ensemble for Robust Visual Question AnsweringCode1
Separating Skills and Concepts for Novel Visual Question AnsweringCode1
How Much Can CLIP Benefit Vision-and-Language Tasks?Code1
Graphhopper: Multi-Hop Scene Graph Reasoning for Visual Question AnsweringCode1
Zero-shot Visual Question Answering using Knowledge GraphCode1
Mind Your Outliers! Investigating the Negative Impact of Outliers on Active Learning for Visual Question AnsweringCode1
RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual WordsCode1
Predicting Human Scanpaths in Visual Question AnsweringCode1
Perception Matters: Detecting Perception Failures of VQA Models Using Metamorphic TestingCode1
Probing Image-Language Transformers for Verb UnderstandingCode1
Check It Again: Progressive Visual Question Answering via Visual EntailmentCode1
Multi-modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-TrainingCode1
Multiple Meta-model Quantifying for Medical Visual Question AnsweringCode1
Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using CapsulesCode1
Passage Retrieval for Outside-Knowledge Visual Question AnsweringCode1
MDETR -- Modulated Detection for End-to-End Multi-Modal UnderstandingCode1
GraghVQA: Language-Guided Graph Neural Networks for Graph-based Visual Question AnsweringCode1
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question AnsweringCode1
MMBERT: Multimodal BERT Pretraining for Improved Medical VQACode1
VisQA: X-raying Vision and Language Reasoning in TransformersCode1
Are Bias Mitigation Techniques for Deep Learning Effective?Code1
Towards General Purpose Vision SystemsCode1
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder TransformersCode1
Multi-Modal Answer Validation for Knowledge-Based VQACode1
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout TransformerCode1
Show:102550
← PrevPage 22 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified