SOTAVerified

Image Description

Papers

Showing 150 of 154 papers

TitleStatusHype
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert ReasonerCode2
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation ModelCode2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image DescriptionsCode2
PandaGPT: One Model To Instruction-Follow Them AllCode2
Text-Visual Semantic Constrained AI-Generated Image Quality AssessmentCode1
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head SuppressionCode1
SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline ModelsCode1
Can Large Multimodal Models Uncover Deep Semantics Behind Images?Code1
Towards image compression with perfect realism at ultra-low bitratesCode1
A skeletonization algorithm for gradient-based optimizationCode1
Chatting Makes Perfect: Chat-based Image RetrievalCode1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationCode1
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language ModelingCode1
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIPCode1
Revisiting Binary Local Image Description for Resource Limited DevicesCode1
Grounded Video DescriptionCode1
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image AnnotationsCode1
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence ModelsCode1
CIDEr: Consensus-based Image Description EvaluationCode1
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism0
LaMOuR: Leveraging Language Models for Out-of-Distribution Recovery in Reinforcement Learning0
VisBias: Measuring Explicit and Implicit Social Biases in Vision Language ModelsCode0
Boli: A dataset for understanding stuttering experience and analyzing stuttered speech0
IDEA: Image Description Enhanced CLIP-AdapterCode0
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis0
A Preliminary Survey of Semantic Descriptive Model for Images0
RRHF-V: Ranking Responses to Mitigate Hallucinations in Multimodal Large Language Models with Human FeedbackCode0
Personalizing Multimodal Large Language Models for Image Captioning: An Experimental Analysis0
TypeScore: A Text Fidelity Metric for Text-to-Image Generative Models0
MiCEval: Unveiling Multimodal Chain of Thought's Quality via Image Description and Reasoning StepsCode0
Difficult Task Yes but Simple Task No: Unveiling the Laziness in Multimodal LLMsCode0
Language Augmentation in CLIP for Improved Anatomy Detection on Multi-modal Medical Images0
Data-augmented phrase-level alignment for mitigating object hallucination0
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization0
MAGID: An Automated Pipeline for Generating Synthetic Multi-modal DatasetsCode0
Artwork Explanation in Large-scale Vision Language Models0
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models0
Seeing the Unseen: Visual Common Sense for Semantic Placement0
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models0
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Impressions: Understanding Visual Semiotics and Aesthetic Impact0
Large Language Models can Share Images, Too!Code0
Bounding and Filling: A Fast and Flexible Framework for Image CaptioningCode0
ContextRef: Evaluating Referenceless Metrics For Image Description GenerationCode0
A Fine-Grained Image Description Generation Method Based on Joint Objectives0
Show:102550
← PrevPage 1 of 4Next →

No leaderboard results yet.