Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

2024-04-17Code Available1· sign in to hype

Zhangchi Feng, Richong Zhang, Zhijie Nie

Code Available — Be the first to reproduce this paper.

Code

github.com/BUAADreamer/SPN4CIR
OfficialIn paperpytorch★ 39

Abstract

The Composed Image Retrieval (CIR) task aims to retrieve target images using a composed query consisting of a reference image and a modified text. Advanced methods often utilize contrastive learning as the optimization objective, which benefits from adequate positive and negative examples. However, the triplet for CIR incurs high manual annotation costs, resulting in limited positive examples. Furthermore, existing methods commonly use in-batch negative sampling, which reduces the negative number available for the model. To address the problem of lack of positives, we propose a data generation method by leveraging a multi-modal large language model to construct triplets for CIR. To introduce more negatives during fine-tuning, we design a two-stage fine-tuning framework for CIR, whose second stage introduces plenty of static representations of negatives to optimize the representation space rapidly. The above two improvements can be effectively stacked and designed to be plug-and-play, easily applied to existing CIR models without changing their original architectures. Extensive experiments and ablation analysis demonstrate that our method effectively scales positives and negatives and achieves state-of-the-art results on both FashionIQ and CIRR datasets. In addition, our method also performs well in zero-shot composed image retrieval, providing a new CIR solution for the low-resources scenario. Our code and data are released at https://github.com/BUAADreamer/SPN4CIR.

Tasks

Contrastive Learning Image Retrieval Image Retrieval on Fashion IQ Language Modelling Large Language Model Retrieval Triplet Zero-Shot Composed Image Retrieval (ZS-CIR)

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIRR	SPN4CIR (SPRC)	(Recall@5+Recall_subset@1)/2	82.69	—	Unverified
Fashion IQ	SPN4CIR (SPRC)	(Recall@10+Recall@50)/2	66.41	—	Unverified

Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives

Code

Abstract

Tasks

Benchmark Results

Reproductions