Image Description

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 154 papers

Title	Date	Tasks	Status	Hype
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning	Oct 14, 2023	Image ClassificationImage Description	CodeCode Available	7
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	Apr 20, 2023	Image DescriptionLanguage Modelling	CodeCode Available	7
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond	Aug 24, 2023	Chart Question AnsweringFS-MEVQA	CodeCode Available	5
Caption Anything: Interactive Image Description with Diverse Multimodal Controls	May 4, 2023	controllable image captioningImage Captioning	CodeCode Available	3
PandaGPT: One Model To Instruction-Follow Them All	May 25, 2023	AllImage Description	CodeCode Available	2
Patho-R1: A Multimodal Reinforcement Learning-Based Pathology Expert Reasoner	May 16, 2025	Cross-Modal RetrievalDiagnostic	CodeCode Available	2
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model	Mar 10, 2025	Image DescriptionImage Generation	CodeCode Available	2
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions	Jun 11, 2024	HallucinationImage Description	CodeCode Available	2
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models	May 19, 2015	Image DescriptionPhrase Grounding	CodeCode Available	1
A skeletonization algorithm for gradient-based optimization	Sep 5, 2023	BenchmarkingDeep Learning	CodeCode Available	1
UniTAB: Unifying Text and Box Outputs for Grounded Vision-Language Modeling	Nov 23, 2021	Image CaptioningImage Description	CodeCode Available	1
DialogCC: An Automated Pipeline for Creating High-Quality Multi-Modal Dialogue Dataset	Dec 8, 2022	DiversityImage Description	CodeCode Available	1
Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation	Oct 20, 2022	DecoderImage Captioning	CodeCode Available	1
Zero-Shot Out-of-Distribution Detection Based on the Pre-trained Model CLIP	Sep 6, 2021	Image DescriptionOut-of-Distribution Detection	CodeCode Available	1
Revisiting Binary Local Image Description for Resource Limited Devices	Aug 18, 2021	Image DescriptionTriplet	CodeCode Available	1
Towards image compression with perfect realism at ultra-low bitrates	Oct 16, 2023	Image CompressionImage Description	CodeCode Available	1
Grounded Video Description	Dec 17, 2018	Image DescriptionSentence	CodeCode Available	1
Text-Visual Semantic Constrained AI-Generated Image Quality Assessment	Jul 14, 2025	Image DescriptionImage Quality Assessment	CodeCode Available	1
Mitigating Hallucinations in Vision-Language Models through Image-Guided Head Suppression	May 22, 2025	HallucinationImage Description	CodeCode Available	1
Chatting Makes Perfect: Chat-based Image Retrieval	May 31, 2023	Chat-based Image RetrievalImage Description	CodeCode Available	1
CIDEr: Consensus-based Image Description Evaluation	Nov 20, 2014	Action RecognitionAttribute	CodeCode Available	1
Can Large Multimodal Models Uncover Deep Semantics Behind Images?	Feb 17, 2024	Image Description	CodeCode Available	1
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations	Feb 23, 2016	image-classificationImage Classification	CodeCode Available	1
SPIDER: A Comprehensive Multi-Organ Supervised Pathology Dataset and Baseline Models	Mar 4, 2025	Image Description	CodeCode Available	1
Focused Evaluation for Image Description with Binary Forced-Choice Tasks	Aug 1, 2016	Image CaptioningImage Description	—Unverified	0
Computer Vision and Conflicting Values: Describing People with Automated Alt Text	May 26, 2021	Image Description	—Unverified	0
A Fine-Grained Image Description Generation Method Based on Joint Objectives	Sep 2, 2023	Image DescriptionObject	—Unverified	0
A Genetic Algorithm Approach for ImageRepresentation Learning through Color Quantization	Nov 18, 2017	Content-Based Image RetrievalImage Description	—Unverified	0
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching	Jun 1, 2013	Image DescriptionVideo Description	—Unverified	0
From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning	Oct 11, 2016	FormGrounded language learning	—Unverified	0
Comparing Automatic Evaluation Measures for Image Description	Jun 1, 2014	Image DescriptionSlot Filling	—Unverified	0
Collecting Image Description Datasets using Crowdsourcing	Nov 12, 2014	Image DescriptionSentence	—Unverified	0
Advanced Chest X-Ray Analysis via Transformer-Based Image Descriptors and Cross-Model Attention Mechanism	Apr 23, 2025	DecoderImage Description	—Unverified	0
Doubly-Attentive Decoder for Multi-modal Neural Machine Translation	Feb 4, 2017	DecoderImage Description	—Unverified	0
A Cognitive Evaluation Benchmark of Image Reasoning and Description for Large Vision-Language Models	Feb 28, 2024	Image DescriptionQuestion Answering	—Unverified	0
Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description	Oct 19, 2017	Image DescriptionMachine Translation	—Unverified	0
Generating Image Captions in Arabic using Root-Word Based Recurrent Neural Networks and Deep Neural Networks	Jun 1, 2018	Caption GenerationImage Captioning	—Unverified	0
Exploring the Use of Contrastive Language-Image Pre-Training for Human Posture Classification: Insights from Yoga Pose Analysis	Jan 13, 2025	Image DescriptionTransfer Learning	—Unverified	0
Artwork Explanation in Large-scale Vision Language Models	Feb 29, 2024	Explanation GenerationImage Description	—Unverified	0
Exploring Visual Relationship for Image Captioning	Sep 19, 2018	DecoderImage Captioning	—Unverified	0
DiffCap: Exploring Continuous Diffusion on Image Captioning	May 20, 2023	Caption GenerationDiversity	—Unverified	0
DIDEC: The Dutch Image Description and Eye-tracking Corpus	Aug 1, 2018	Image DescriptionSpecificity	—Unverified	0
A Preliminary Survey of Semantic Descriptive Model for Images	Jan 13, 2025	DescriptiveImage Description	—Unverified	0
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space	Nov 19, 2017	Caption GenerationImage Description	—Unverified	0
Adding the Third Dimension to Spatial Relation Detection in 2D Images	Nov 1, 2018	Image DescriptionObject	—Unverified	0
Don't Mention the Shoe! A Learning to Rank Approach to Content Selection for Image Description Generation	Sep 1, 2016	Image DescriptionImage Retrieval	—Unverified	0
Exploring the Behavior of Classic REG Algorithms in the Description of Characters in 3D Images	Sep 1, 2017	Image DescriptionReferring Expression	—Unverified	0
Draw and Tell: Multimodal Descriptions Outperform Verbal- or Sketch-Only Descriptions in an Image Retrieval Task	Nov 1, 2017	Image DescriptionImage Retrieval	—Unverified	0
A Shared Task on Multimodal Machine Translation and Crosslingual Image Description	Aug 1, 2016	Image DescriptionImage Retrieval	—Unverified	0
Face2Text revisited: Improved data set and baseline results	May 24, 2022	Image DescriptionTransfer Learning	—Unverified	0

Show:10 25 50

← PrevPage 1 of 4Next →

No leaderboard results yet.