Image Captioning

Image Captioning is the task of describing the content of an image in words. This task lies at the intersection of computer vision and natural language processing. Most image captioning systems use an encoder-decoder framework, where an input image is encoded into an intermediate representation of the information in the image, and then decoded into a descriptive text sequence. The most popular benchmarks are nocaps and COCO, and models are typically evaluated according to a BLEU or CIDER metric.

( Image credit: Reflective Decoding Network for Image Captioning, ICCV'19)

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1201–1250 of 1878 papers

Title	Date	Tasks	Status	Hype
Curious Case of Language Generation Evaluation Metrics: A Cautionary Tale	Oct 26, 2020	Image CaptioningMachine Translation	—Unverified	0
Can images help recognize entities? A study of the role of images for Multimodal NER	Oct 23, 2020	Image Captioningnamed-entity-recognition	CodeCode Available	1
WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information	Oct 21, 2020	Audio captioningDecoder	CodeCode Available	1
Bayesian Attention Modules	Oct 20, 2020	Image CaptioningMachine Translation	CodeCode Available	1
Image Captioning with Visual Object Representations Grounded in the Textual Modality	Oct 19, 2020	Image CaptioningObject	—Unverified	0
A Corpus for English-Japanese Multimodal Neural Machine Translation with Comparable Sentences	Oct 17, 2020	Image CaptioningMachine Translation	—Unverified	0
New Ideas and Trends in Deep Multimodal Content Understanding: A Review	Oct 16, 2020	Cross-Modal RetrievalDeep Learning	—Unverified	0
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision	Oct 14, 2020	Image CaptioningLanguage Modeling	CodeCode Available	1
Positioning yourself in the maze of Neural Text Generation: A Task-Agnostic Survey	Oct 14, 2020	Image CaptioningMachine Translation	—Unverified	0
Visual News: Benchmark and Challenges in News Image Captioning	Oct 8, 2020	ArticlesImage Captioning	CodeCode Available	1
Dense Relational Image Captioning via Multi-task Triple-Stream Networks	Oct 8, 2020	Graph GenerationImage Captioning	CodeCode Available	1
A Novel Actor Dual-Critic Model for Remote Sensing Image Captioning	Oct 5, 2020	DecoderDeep Reinforcement Learning	—Unverified	0
UNISON: Unpaired Cross-lingual Image Captioning	Oct 3, 2020	Caption GenerationImage Captioning	—Unverified	0
CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns	Oct 2, 2020	Image Captioningobject-detection	—Unverified	0
Pix2Prof: fast extraction of sequential information from galaxy imagery via a deep natural language 'captioning' model	Oct 1, 2020	CPUImage Captioning	CodeCode Available	1
Teacher-Critical Training Strategies for Image Captioning	Sep 30, 2020	Image CaptioningReinforcement Learning (RL)	—Unverified	0
Learning Object Detection from Captions via Textual Scene Attributes	Sep 30, 2020	Image CaptioningObject	—Unverified	0
Spatial Attention as an Interface for Image Captioning Models	Sep 29, 2020	Image CaptioningQuestion Answering	—Unverified	0
VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning	Sep 28, 2020	Image CaptioningObject	—Unverified	0
Neural Twins Talk	Sep 26, 2020	Image CaptioningSentence	CodeCode Available	0
Are scene graphs good enough to improve Image Captioning?	Sep 25, 2020	DecoderGraph Attention	CodeCode Available	1
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers	Sep 23, 2020	Image CaptioningImage Generation	CodeCode Available	1
Image Captioning with Attention for Smart Local Tourism using EfficientNet	Sep 18, 2020	Image Captioning	CodeCode Available	0
A Multimodal Memes Classification: A Survey and Open Research Issues	Sep 17, 2020	ClassificationGeneral Classification	—Unverified	0
Denoising Large-Scale Image Captioning from Alt-text Data using Content Selection Models	Sep 10, 2020	Caption GenerationDenoising	—Unverified	0
Towards Unique and Informative Captioning of Images	Sep 8, 2020	DiversityImage Captioning	CodeCode Available	1
An Efficient Technique for Image Captioning using Deep Neural Network	Sep 5, 2020	Image Captioningimage-classification	—Unverified	0
Structure-Aware Generation Network for Recipe Generation from Images	Sep 2, 2020	Image CaptioningRecipe Generation	CodeCode Available	0
Hierarchical memory decoder for visual narrating	Sep 1, 2020	DecoderImage Captioning	—Unverified	0
A Survey of Evaluation Metrics Used for NLG Systems	Aug 27, 2020	Image Captioningnlg evaluation	—Unverified	0
Attr2Style: A Transfer Learning Approach for Inferring Fashion Styles via Apparel Attributes	Aug 26, 2020	AttributeImage Captioning	—Unverified	0
Protect, Show, Attend and Tell: Empowering Image Captioning Models with Ownership Protection	Aug 25, 2020	Image Captioningimage-classification	CodeCode Available	1
Linguistically-aware Attention for Reducing the Semantic-Gap in Vision-Language Tasks	Aug 18, 2020	Image CaptioningVisual Question Answering (VQA)	—Unverified	0
Text as Neural Operator: Image Manipulation by Text Instruction	Aug 11, 2020	Conditional Image GenerationImage Captioning	CodeCode Available	1
Describe What to Change: A Text-guided Unsupervised Image-to-Image Translation Approach	Aug 10, 2020	AttributeImage Captioning	CodeCode Available	1
Assisting Scene Graph Generation with Self-Supervision	Aug 8, 2020	Graph GenerationImage Captioning	—Unverified	0
Textual Description for Mathematical Equations	Aug 7, 2020	Image Captioning	CodeCode Available	0
Fashion Captioning: Towards Generating Accurate Descriptions with Semantic Rewards	Aug 6, 2020	AttributeImage Captioning	CodeCode Available	1
Learning Visual Representations with Caption Annotations	Aug 4, 2020	Image CaptioningLanguage Modeling	—Unverified	0
Recurrent Image Annotation With Explicit Inter-Label Dependencies	Aug 1, 2020	Image Captioning	CodeCode Available	0
Learning to Generate Grounded Visual Captions without Localization Supervision	Aug 1, 2020	Image CaptioningLanguage Modelling	CodeCode Available	1
Evaluating Automatically Generated Phoneme Captions for Images	Jul 31, 2020	Image Captioning	—Unverified	0
Decomposing Generation Networks with Structure Prediction for Recipe Generation	Jul 27, 2020	Image CaptioningRecipe Generation	—Unverified	0
Comprehensive Image Captioning via Scene Graph Decomposition	Jul 23, 2020	DiversityImage Captioning	CodeCode Available	1
Integrating Image Captioning with Rule-based Entity Masking	Jul 22, 2020	DiversityImage Captioning	—Unverified	0
Fine-Grained Image Captioning with Global-Local Discriminative Objective	Jul 21, 2020	DescriptiveImage Captioning	CodeCode Available	0
Improving Diversity and Reducing Redundancy in Paragraph Captions	Jul 19, 2020	DecoderDense Captioning	—Unverified	0
Length-Controllable Image Captioning	Jul 19, 2020	controllable image captioningDecoder	CodeCode Available	1
Consensus-Aware Visual-Semantic Embedding for Image-Text Matching	Jul 17, 2020	Image CaptioningImage-text matching	CodeCode Available	1
Compare and Reweight: Distinctive Image Captioning Using Similar Images Sets	Jul 14, 2020	Image CaptioningRetrieval	—Unverified	0

Show:10 25 50

← PrevPage 25 of 38Next →

All datasets VizWiz 2020 test-dev COCO Captions nocaps in-domain nocaps near-domain nocaps out-of-domain nocaps entire COCO (Common Objects in Context)VizWiz 2020 test nocaps-XD entire nocaps-val-in-domain nocaps-val-overall nocaps-XD in-domain

Benchmark Results

#	Model	Metric	Claimed	Verified	Status
1	IBM Research AI	CIDEr	80.67	—	Unverified
2	CASIA_IVA	CIDEr	79.15	—	Unverified
3	feixiang	CIDEr	77.31	—	Unverified
4	wocao	CIDEr	77.21	—	Unverified
5	lamiwab172	CIDEr	75.93	—	Unverified
6	RUC_AIM3	CIDEr	73.52	—	Unverified
7	funas	CIDEr	73.51	—	Unverified
8	SRC-B_VCLab	CIDEr	73.47	—	Unverified
9	sparta	CIDEr	73.41	—	Unverified
10	x-viz	CIDEr	73.26	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	VALOR	CIDER	152.5	—	Unverified
2	VAST	CIDER	149	—	Unverified
3	Virtex (ResNet-101)	CIDER	94	—	Unverified
4	KOSMOS-1 (1.6B) (zero-shot)	CIDER	84.7	—	Unverified
5	BLIP-FuseCap	CLIPScore	78.5	—	Unverified
6	mPLUG	BLEU-4	46.5	—	Unverified
7	OFA	BLEU-4	44.9	—	Unverified
8	GIT	BLEU-4	44.1	—	Unverified
9	BLIP-2 ViT-G OPT 2.7B (zero-shot)	BLEU-4	43.7	—	Unverified
10	BLIP-2 ViT-G OPT 6.7B (zero-shot)	BLEU-4	43.5	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLI	CIDEr	149.1	—	Unverified
2	GIT2, Single Model	CIDEr	124.18	—	Unverified
3	GIT, Single Model	CIDEr	122.4	—	Unverified
4	PaLI	CIDEr	121.09	—	Unverified
5	CoCa - Google Brain	CIDEr	117.9	—	Unverified
6	Microsoft Cognitive Services team	CIDEr	112.82	—	Unverified
7	Single Model	CIDEr	108.98	—	Unverified
8	GRIT (zero-shot, no VL pretraining, no CBS)	CIDEr	105.9	—	Unverified
9	FudanFVL	CIDEr	104.9	—	Unverified
10	FudanWYZ	CIDEr	104.25	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	GIT2, Single Model	CIDEr	125.51	—	Unverified
2	PaLI	CIDEr	124.35	—	Unverified
3	GIT, Single Model	CIDEr	123.92	—	Unverified
4	CoCa - Google Brain	CIDEr	120.73	—	Unverified
5	Microsoft Cognitive Services team	CIDEr	115.54	—	Unverified
6	Single Model	CIDEr	110.76	—	Unverified
7	FudanFVL	CIDEr	109.33	—	Unverified
8	FudanWYZ	CIDEr	108.04	—	Unverified
9	IEDA-LAB	CIDEr	100.15	—	Unverified
10	firethehole	CIDEr	99.51	—	Unverified

#	Model	Metric	Claimed	Verified	Status
1	PaLI	CIDEr	126.67	—	Unverified
2	GIT2, Single Model	CIDEr	122.27	—	Unverified
3	GIT, Single Model	CIDEr	122.04	—	Unverified
4	CoCa - Google Brain	CIDEr	121.69	—	Unverified
5	Microsoft Cognitive Services team	CIDEr	110.14	—	Unverified
6	Single Model	CIDEr	109.49	—	Unverified
7	FudanFVL	CIDEr	106.55	—	Unverified
8	FudanWYZ	CIDEr	103.75	—	Unverified
9	Human	CIDEr	91.62	—	Unverified
10	firethehole	CIDEr	88.54	—	Unverified