SOTAVerified

Image Description

Papers

Showing 110 of 154 papers

TitleStatusHype
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelCode2
PandaGPT: One Model To Instruction-Follow Them AllCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
Can Large Multimodal Models Uncover Deep Semantics Behind Images?Code1
Chatting Makes Perfect: Chat-based Image RetrievalCode1
Show:102550
← PrevPage 1 of 16Next →

No leaderboard results yet.