SOTAVerified

Image Comprehension

Papers

Showing 3140 of 49 papers

TitleStatusHype
Alleviating Hallucination in Large Vision-Language Models with Active Retrieval Augmentation0
SimpleVQA: Multimodal Factuality Evaluation for Multimodal Large Language Models0
SlideAVSR: A Dataset of Paper Explanation Videos for Audio-Visual Speech Recognition0
Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges0
Teach Multimodal LLMs to Comprehend Electrocardiographic Images0
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-training and Multi-modal Tokens0
Unveiling Glitches: A Deep Dive into Image Encoding Bugs within CLIP0
What Large Language Models Bring to Text-rich VQA?0
On the Performance of Multimodal Language Models0
RAD: Retrieval-Augmented Decision-Making of Meta-Actions with Vision-Language Models in Autonomous Driving0
Show:102550
← PrevPage 4 of 5Next →

No leaderboard results yet.