SOTAVerified

Image Description

Papers

Showing 2650 of 154 papers

TitleStatusHype
Seeing the Unseen: Visual Common Sense for Semantic Placement0
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models0
Localized Symbolic Knowledge Distillation for Visual Commonsense ModelsCode0
Impressions: Understanding Visual Semiotics and Aesthetic Impact0
Large Language Models can Share Images, Too!Code0
Towards image compression with perfect realism at ultra-low bitratesCode1
Bounding and Filling: A Fast and Flexible Framework for Image CaptioningCode0
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learningCode7
ContextRef: Evaluating Referenceless Metrics For Image Description GenerationCode0
A skeletonization algorithm for gradient-based optimizationCode1
A Fine-Grained Image Description Generation Method Based on Joint Objectives0
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and BeyondCode5
Chatting Makes Perfect: Chat-based Image RetrievalCode1
PandaGPT: One Model To Instruction-Follow Them AllCode2
DiffCap: Exploring Continuous Diffusion on Image Captioning0
Caption Anything: Interactive Image Description with Diverse Multimodal ControlsCode3
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language ModelsCode7
Fan-Beam Binarization Difference Projection (FB-BDP): A Novel Local Object Descriptor for Fine-Grained Leaf Image RetrievalCode0
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue DatasetCode1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text GenerationCode1
Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information RetrievalCode0
Facial Expression Recognition and Image Description Generation in Vietnamese0
Skeletal Human Action Recognition using Hybrid Attention based Graph Convolutional NetworkCode0
Image Description Dataset for Language Learners0
Multilingual Image Corpus – Towards a Multimodal and Multilingual Dataset0
Show:102550
← PrevPage 2 of 7Next →

No leaderboard results yet.