SOTAVerified

Image Description

Papers

Showing 125 of 154 papers

TitleStatusHype
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
PandaGPT: One Model To Instruction-Follow Them AllCode2
Text-Visual Semantic Constrained AI-Generated Image Quality AssessmentCode1
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head SuppressionCode1
SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline ModelsCode1
Can Large Multimodal Models Uncover Deep Semantics Behind Images?Code1
Towards image compression with perfect realism at ultra-low bitratesCode1
A skeletonization algorithm for gradient-based optimizationCode1
Chatting Makes Perfect: Chat-based Image RetrievalCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationCode1
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language ModelingCode1
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIPCode1
Revisiting Binary Local Image Description for Resource Limited DevicesCode1
Grounded Video DescriptionCode1
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image AnnotationsCode1
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence ModelsCode1
CIDEr: Consensus-based Image Description EvaluationCode1
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism0
Show:102550
← PrevPage 1 of 7Next →

No leaderboard results yet.