VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

2017-07-18Code Available1· sign in to hype

Fartash Faghri, David J. Fleet, Jamie Ryan Kiros, Sanja Fidler

Code Available — Be the first to reproduce this paper.

Code

github.com/fartashf/vsepp
OfficialIn paperpytorch★ 521
github.com/cshizhe/hgr_v2t
pytorch★ 211
github.com/leolee99/CLIP_ITM
pytorch★ 19
github.com/salanueva/UniVSE
pytorch★ 10
github.com/armandvilalta/Full-network-multimodal-embeddings
none★ 2
github.com/kadarakos/mulisera
pytorch★ 0
github.com/mitjanikolaus/compositional-image-captioning
pytorch★ 0
github.com/Cadene/recipe1m.bootstrap.pytorch
pytorch★ 0
github.com/rohitbhaskar/online-ads-repository
pytorch★ 0
github.com/gorjanradevski/vsepp_tensorflow
tf★ 0

Abstract

We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to common loss functions used for multi-modal embeddings. That, combined with fine-tuning and use of augmented data, yields significant gains in retrieval performance. We showcase our approach, VSE++, on MS-COCO and Flickr30K datasets, using ablation studies and comparisons with existing methods. On MS-COCO our approach outperforms state-of-the-art methods by 8.8% in caption retrieval and 11.3% in image retrieval (at R@1).

Tasks

Cross-Modal Retrieval Image Retrieval Retrieval Structured Prediction Visual Reasoning

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Flickr30k	VSE++ (ResNet)	Image-to-text R@1	52.9	—	Unverified

VSE++: Improving Visual-Semantic Embeddings with Hard Negatives

Code

Abstract

Tasks

Benchmark Results

Reproductions