Image Captioning

Image Captioning is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence. The most popular benchmarks are nocaps and COCO, and models are typically evaluated according to a BLEU or CIDER metric.

( Image credit: Reflective Decoding Network for Image Captioning, ICCV'19)

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1851–1878 of 1878 papers

Title	Date	Tasks	Status
Neural Attention for Image Captioning: Review of Outstanding Methods	Nov 29, 2021	DecoderDeep Learning	—Unverified
Neural Caption Generation for News Images	May 1, 2018	Caption GenerationImage Captioning	—Unverified
Neural Headline Generation on Abstract Meaning Representation	Nov 1, 2016	Abstract Meaning RepresentationDependency Parsing	—Unverified
Neural Image Captioning	Jul 2, 2019	Image CaptioningMachine Translation	—Unverified
Neural Joking Machine : Humorous image captioning	May 30, 2018	Image Captioning	—Unverified
Neural Machine Translation: Basics, Practical Aspects and Recent Trends	Nov 1, 2017	Image CaptioningMachine Translation	—Unverified
Neural Monkey: The Current State and Beyond	Mar 1, 2018	Image CaptioningMachine Translation	—Unverified
Neural Scene De-Rendering	Jul 1, 2017	DecoderImage Captioning	—Unverified
Neural Text Generation with Artificial Negative Examples	Dec 28, 2020	Image CaptioningMachine Translation	—Unverified
Neuro-Symbolic Learning: Principles and Applications in Ophthalmology	Jul 31, 2022	Common Sense ReasoningImage Captioning	—Unverified
NEVLP: Noise-Robust Framework for Efficient Vision-Language Pre-training	Sep 15, 2024	Contrastive Learningcross-modal alignment	—Unverified
New Ideas and Trends in Deep Multimodal Content Understanding: A Review	Oct 16, 2020	Cross-Modal RetrievalDeep Learning	—Unverified
New Encoder Learning for Captioning Heavy Rain Images via Semantic Visual Feature Matching	May 28, 2021	DecoderImage Captioning	—Unverified
NICE: CVPR 2023 Challenge on Zero-shot Image Captioning	Sep 5, 2023	FairnessImage Captioning	—Unverified
NLIP: Noise-robust Language-Image Pre-training	Dec 14, 2022	Image CaptioningImage-text Retrieval	—Unverified
NLPHut’s Participation at WAT2021	Aug 1, 2021	Caption GenerationImage Captioning	—Unverified
NNEval: Neural Network based Evaluation Metric for Image Captioning	Sep 1, 2018	Image CaptioningSentence	—Unverified
No Detail Left Behind: Revisiting Self-Retrieval for Fine-Grained Image Captioning	Sep 4, 2024	Image CaptioningRetrieval	—Unverified
Non-Autoregressive Image Captioning with Counterfactuals-Critical Multi-Agent Learning	May 10, 2020	Image CaptioningMachine Translation	—Unverified
Nonparametric Method for Data-driven Image Captioning	Jun 1, 2014	Density EstimationImage Captioning	—Unverified
Normalized and Geometry-Aware Self-Attention Network for Image Captioning	Mar 19, 2020	Image CaptioningMachine Translation	—Unverified
NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI	May 20, 2025	Anomaly LocalizationBenchmarking	—Unverified
O2NA: An Object-Oriented Non-Autoregressive Approach for Controllable Video Captioning	Aug 5, 2021	AttributeCaption Generation	—Unverified
OBJ2TEXT: Generating Visually Descriptive Language from Object Layouts	Jul 22, 2017	Caption GenerationDescriptive	—Unverified
Object Counts! Bringing Explicit Detections Back into Image Captioning	Apr 23, 2018	Image CaptioningLanguage Modeling	—Unverified
Object-oriented backdoor attack against image captioning	Jan 5, 2024	Backdoor AttackImage Captioning	—Unverified
ODIANLP’s Participation in WAT2020	Dec 1, 2020	Hindi Image CaptioningImage Captioning	—Unverified
Off-Policy Self-Critical Training for Transformer in Visual Paragraph Generation	Jun 21, 2020	Image CaptioningReinforcement Learning (RL)	—Unverified

Show:10 25 50

← PrevPage 38 of 38Next →

All datasets VizWiz 2020 test-dev COCO Captions nocaps in-domain nocaps near-domain nocaps out-of-domain nocaps entire COCO (Common Objects in Context)VizWiz 2020 test nocaps-XD entire nocaps-val-in-domain nocaps-val-overall nocaps-XD in-domain

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IBM Research AI	CIDEr	80.67	—	Unverified
2	CASIA_IVA	CIDEr	79.15	—	Unverified
3	feixiang	CIDEr	77.31	—	Unverified
4	wocao	CIDEr	77.21	—	Unverified
5	lamiwab172	CIDEr	75.93	—	Unverified
6	RUC_AIM3	CIDEr	73.52	—	Unverified
7	funas	CIDEr	73.51	—	Unverified
8	SRC-B_VCLab	CIDEr	73.47	—	Unverified
9	sparta	CIDEr	73.41	—	Unverified
10	x-viz	CIDEr	73.26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VALOR	CIDER	152.5	—	Unverified
2	VAST	CIDER	149	—	Unverified
3	Virtex (ResNet-101)	CIDER	94	—	Unverified
4	KOSMOS-1 (1.6B) (zero-shot)	CIDER	84.7	—	Unverified
5	BLIP-FuseCap	CLIPScore	78.5	—	Unverified
6	mPLUG	BLEU-4	46.5	—	Unverified
7	OFA	BLEU-4	44.9	—	Unverified
8	GIT	BLEU-4	44.1	—	Unverified
9	BLIP-2 ViT-G OPT 2.7B (zero-shot)	BLEU-4	43.7	—	Unverified
10	BLIP-2 ViT-G OPT 6.7B (zero-shot)	BLEU-4	43.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLI	CIDEr	149.1	—	Unverified
2	GIT2, Single Model	CIDEr	124.18	—	Unverified
3	GIT, Single Model	CIDEr	122.4	—	Unverified
4	PaLI	CIDEr	121.09	—	Unverified
5	CoCa - Google Brain	CIDEr	117.9	—	Unverified
6	Microsoft Cognitive Services team	CIDEr	112.82	—	Unverified
7	Single Model	CIDEr	108.98	—	Unverified
8	GRIT (zero-shot, no VL pretraining, no CBS)	CIDEr	105.9	—	Unverified
9	FudanFVL	CIDEr	104.9	—	Unverified
10	FudanWYZ	CIDEr	104.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GIT2, Single Model	CIDEr	125.51	—	Unverified
2	PaLI	CIDEr	124.35	—	Unverified
3	GIT, Single Model	CIDEr	123.92	—	Unverified
4	CoCa - Google Brain	CIDEr	120.73	—	Unverified
5	Microsoft Cognitive Services team	CIDEr	115.54	—	Unverified
6	Single Model	CIDEr	110.76	—	Unverified
7	FudanFVL	CIDEr	109.33	—	Unverified
8	FudanWYZ	CIDEr	108.04	—	Unverified
9	IEDA-LAB	CIDEr	100.15	—	Unverified
10	firethehole	CIDEr	99.51	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLI	CIDEr	126.67	—	Unverified
2	GIT2, Single Model	CIDEr	122.27	—	Unverified
3	GIT, Single Model	CIDEr	122.04	—	Unverified
4	CoCa - Google Brain	CIDEr	121.69	—	Unverified
5	Microsoft Cognitive Services team	CIDEr	110.14	—	Unverified
6	Single Model	CIDEr	109.49	—	Unverified
7	FudanFVL	CIDEr	106.55	—	Unverified
8	FudanWYZ	CIDEr	103.75	—	Unverified
9	Human	CIDEr	91.62	—	Unverified
10	firethehole	CIDEr	88.54	—	Unverified