SOTAVerified

Visual Question Answering

MLLM Leaderboard

Papers

Showing 701725 of 2177 papers

TitleStatusHype
Detect2Interact: Localizing Object Key Field in Visual Question Answering (VQA) with LLMs0
Designing a Robust Radiology Report Generation System0
Beyond Logit Lens: Contextual Embeddings for Robust Hallucination Detection & Grounding in VLMs0
Achieving Human Parity on Visual Question Answering0
Improving Data Augmentation for Robust Visual Question Answering with Effective Curriculum Learning0
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions0
Inquire, Interact, and Integrate: A Proactive Agent Collaborative Framework for Zero-Shot Multimodal Medical Reasoning0
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model0
Beyond Human Vision: The Role of Large Vision Language Models in Microscope Image Analysis0
Advancing Medical Imaging with Language Models: A Journey from N-grams to ChatGPT0
DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs0
Beyond Captioning: Task-Specific Prompting for Improved VLM Performance in Mathematical Reasoning0
An experimental study of the vision-bottleneck in VQA0
Improved Baselines for Data-efficient Perceptual Augmentation of LLMs0
Improved Bilinear Pooling with CNNs0
An Evaluation of GPT-4V and Gemini in Online VQA0
Deep learning evaluation using deep linguistic processing0
Imperfect Vision Encoders: Efficient and Robust Tuning for Vision-Language Models0
Deep Exemplar Networks for VQA and VQG0
Deep Bayesian Active Learning for Multiple Correct Outputs0
BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering0
Deep Attention Neural Tensor Network for Visual Question Answering0
Decoupled Box Proposal and Featurization with Ultrafine-Grained Semantic Labels Improve Image Captioning and Visual Question Answering0
Benchmarking Vision Language Models for Cultural Understanding0
Decouple Before Interact: Multi-Modal Prompt Learning for Continual Visual Question Answering0
Show:102550
← PrevPage 29 of 88Next →

Benchmark Results

#ModelMetricClaimedVerifiedStatus
1MMCTAgent (GPT-4 + GPT-4V)GPT-4 score74.24Unverified
2Qwen2-VL-72BGPT-4 score74Unverified
3InternVL2.5-78BGPT-4 score72.3Unverified
4GPT-4o +text rationale +IoTGPT-4 score72.2Unverified
5Lyra-ProGPT-4 score71.4Unverified
6GLM-4V-PlusGPT-4 score71.1Unverified
7Phantom-7BGPT-4 score70.8Unverified
8InternVL2.5-38BGPT-4 score68.8Unverified
9InternVL2-26B (SGP, token ratio 64%)GPT-4 score65.6Unverified
10Baichuan-Omni (7B)GPT-4 score65.4Unverified