SOTAVerified

Image Description

Papers

Showing 110 of 154 papers

TitleStatusHype
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
PandaGPT: One Model To Instruction-Follow Them AllCode2
Text-Visual Semantic Constrained AI-Generated Image Quality AssessmentCode1
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head SuppressionCode1
Show:102550
← PrevPage 1 of 16Next →

No leaderboard results yet.