SOTAVerified

Image Description

Papers

Showing 110 of 154 papers

TitleStatusHype
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelCode2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
PandaGPT: One Model To Instruction-Follow Them AllCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language ModelingCode1
Show:102550
← PrevPage 1 of 16Next →

No leaderboard results yet.