Image Captioning

Image Captioning is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence. The most popular benchmarks are nocaps and COCO, and models are typically evaluated according to a BLEU or CIDER metric.

( Image credit: Reflective Decoding Network for Image Captioning, ICCV'19)

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1751–1775 of 1878 papers

Title	Date	Tasks	Status
A Girl Has A Name: Detecting Authorship Obfuscation	May 2, 2020	Authorship AttributionImage Captioning	CodeCode Available
Good News, Everyone! Context driven entity-aware captioning for news images	Apr 2, 2019	ArticlesDescriptive	CodeCode Available
UIT-ViIC: A Dataset for the First Evaluation on Vietnamese Image Captioning	Feb 1, 2020	Image CaptioningVietnamese Datasets	CodeCode Available
STAIR Captions: Constructing a Large-Scale Japanese Image Caption Dataset	May 2, 2017	Image CaptioningMachine Translation	CodeCode Available
Can Active Memory Replace Attention?	Oct 27, 2016	Image Captioningimage-classification	CodeCode Available
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data	Jan 31, 2024	BenchmarkingChange Detection	CodeCode Available
Stepwise Decomposition and Dual-stream Focus: A Novel Approach for Training-free Camouflaged Object Segmentation	Jun 7, 2025	Camouflaged Object SegmentationFeature Correlation	CodeCode Available
UMONS Submission for WMT18 Multimodal Translation Task	Oct 15, 2018	Image CaptioningMachine Translation	CodeCode Available
Oracle performance for visual captioning	Nov 14, 2015	Image CaptioningLanguage Modeling	CodeCode Available
Global Object Proposals for Improving Multi-Sentence Video Descriptions	Jul 18, 2021	Caption GenerationDense Video Captioning	CodeCode Available
Structure-Aware Generation Network for Recipe Generation from Images	Sep 2, 2020	Image CaptioningRecipe Generation	CodeCode Available
Aesthetic Image Captioning From Weakly-Labelled Photographs	Aug 29, 2019	Aesthetic Image CaptioningBenchmarking	CodeCode Available
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning	Oct 1, 2021	DecoderImage Captioning	CodeCode Available
Bridging Vision and Language Spaces with Assignment Prediction	Apr 15, 2024	Cross-Modal RetrievalImage Captioning	CodeCode Available
What value do explicit high level concepts have in vision to language problems?	Jun 3, 2015	Image CaptioningQuestion Answering	CodeCode Available
Bridging the Visual Gap: Fine-Tuning Multimodal Models with Knowledge-Adapted Captions	Nov 13, 2024	DescriptiveHallucination	CodeCode Available
Bridging by Word: Image Grounded Vocabulary Construction for Visual Captioning	Jul 1, 2019	Decision MakingImage Captioning	CodeCode Available
Visual Choice of Plausible Alternatives: An Evaluation of Image-based Commonsense Causal Reasoning	May 1, 2018	Commonsense Causal ReasoningImage Captioning	CodeCode Available
Geodesic Multi-Modal Mixup for Robust Fine-Tuning	Mar 8, 2022	Image Captioningzero-shot-classification	CodeCode Available
Surprisingly Easy Hard-Attention for Sequence to Sequence Learning	Oct 1, 2018	Hard AttentionImage Captioning	CodeCode Available
A Comprehensive Survey of Deep Learning for Image Captioning	Oct 6, 2018	Deep LearningImage Captioning	CodeCode Available
Paraphrase Acquisition from Image Captions	Jan 26, 2023	ArticlesImage Captioning	CodeCode Available
Understanding Guided Image Captioning Performance across Domains	Dec 4, 2020	DescriptiveImage Captioning	CodeCode Available
A request for clarity over the End of Sequence token in the Self-Critical Sequence Training	May 20, 2023	Image CaptioningSentence	CodeCode Available
Context-Aware Visual Policy Network for Fine-Grained Image Captioning	Jun 6, 2019	Image CaptioningImage Paragraph Captioning	CodeCode Available

Show:10 25 50

← PrevPage 71 of 76Next →

All datasets VizWiz 2020 test-dev COCO Captions nocaps in-domain nocaps near-domain nocaps out-of-domain nocaps entire COCO (Common Objects in Context)VizWiz 2020 test nocaps-XD entire nocaps-val-in-domain nocaps-val-overall nocaps-XD in-domain

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IBM Research AI	CIDEr	80.67	—	Unverified
2	CASIA_IVA	CIDEr	79.15	—	Unverified
3	feixiang	CIDEr	77.31	—	Unverified
4	wocao	CIDEr	77.21	—	Unverified
5	lamiwab172	CIDEr	75.93	—	Unverified
6	RUC_AIM3	CIDEr	73.52	—	Unverified
7	funas	CIDEr	73.51	—	Unverified
8	SRC-B_VCLab	CIDEr	73.47	—	Unverified
9	sparta	CIDEr	73.41	—	Unverified
10	x-viz	CIDEr	73.26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VALOR	CIDER	152.5	—	Unverified
2	VAST	CIDER	149	—	Unverified
3	Virtex (ResNet-101)	CIDER	94	—	Unverified
4	KOSMOS-1 (1.6B) (zero-shot)	CIDER	84.7	—	Unverified
5	BLIP-FuseCap	CLIPScore	78.5	—	Unverified
6	mPLUG	BLEU-4	46.5	—	Unverified
7	OFA	BLEU-4	44.9	—	Unverified
8	GIT	BLEU-4	44.1	—	Unverified
9	BLIP-2 ViT-G OPT 2.7B (zero-shot)	BLEU-4	43.7	—	Unverified
10	BLIP-2 ViT-G OPT 6.7B (zero-shot)	BLEU-4	43.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLI	CIDEr	149.1	—	Unverified
2	GIT2, Single Model	CIDEr	124.18	—	Unverified
3	GIT, Single Model	CIDEr	122.4	—	Unverified
4	PaLI	CIDEr	121.09	—	Unverified
5	CoCa - Google Brain	CIDEr	117.9	—	Unverified
6	Microsoft Cognitive Services team	CIDEr	112.82	—	Unverified
7	Single Model	CIDEr	108.98	—	Unverified
8	GRIT (zero-shot, no VL pretraining, no CBS)	CIDEr	105.9	—	Unverified
9	FudanFVL	CIDEr	104.9	—	Unverified
10	FudanWYZ	CIDEr	104.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GIT2, Single Model	CIDEr	125.51	—	Unverified
2	PaLI	CIDEr	124.35	—	Unverified
3	GIT, Single Model	CIDEr	123.92	—	Unverified
4	CoCa - Google Brain	CIDEr	120.73	—	Unverified
5	Microsoft Cognitive Services team	CIDEr	115.54	—	Unverified
6	Single Model	CIDEr	110.76	—	Unverified
7	FudanFVL	CIDEr	109.33	—	Unverified
8	FudanWYZ	CIDEr	108.04	—	Unverified
9	IEDA-LAB	CIDEr	100.15	—	Unverified
10	firethehole	CIDEr	99.51	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLI	CIDEr	126.67	—	Unverified
2	GIT2, Single Model	CIDEr	122.27	—	Unverified
3	GIT, Single Model	CIDEr	122.04	—	Unverified
4	CoCa - Google Brain	CIDEr	121.69	—	Unverified
5	Microsoft Cognitive Services team	CIDEr	110.14	—	Unverified
6	Single Model	CIDEr	109.49	—	Unverified
7	FudanFVL	CIDEr	106.55	—	Unverified
8	FudanWYZ	CIDEr	103.75	—	Unverified
9	Human	CIDEr	91.62	—	Unverified
10	firethehole	CIDEr	88.54	—	Unverified