Data Roaming and Quality Assessment for Composed Image Retrieval

2023-03-16Code Available1· sign in to hype

Matan Levy, Rami Ben-Ari, Nir Darshan, Dani Lischinski

Code Available — Be the first to reproduce this paper.

Code

github.com/levymsn/LaSCo
Officialnone★ 10

Abstract

The task of Composed Image Retrieval (CoIR) involves queries that combine image and text modalities, allowing users to express their intent more effectively. However, current CoIR datasets are orders of magnitude smaller compared to other vision and language (V&L) datasets. Additionally, some of these datasets have noticeable issues, such as queries containing redundant modalities. To address these shortcomings, we introduce the Large Scale Composed Image Retrieval (LaSCo) dataset, a new CoIR dataset which is ten times larger than existing ones. Pre-training on our LaSCo, shows a noteworthy improvement in performance, even in zero-shot. Furthermore, we propose a new approach for analyzing CoIR datasets and methods, which detects modality redundancy or necessity, in queries. We also introduce a new CoIR baseline, the Cross-Attention driven Shift Encoder (CASE). This baseline allows for early fusion of modalities using a cross-attention module and employs an additional auxiliary task during training. Our experiments demonstrate that this new baseline outperforms the current state-of-the-art methods on established benchmarks like FashionIQ and CIRR.

Tasks

Composed Image Retrieval (CoIR)Image Retrieval Retrieval

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIRR	CASE (Pre-trained on LaSCo.Ca)	(Recall@5+Recall_subset@1)/2	78.25	—	Unverified
CIRR	CASE	(Recall@5+Recall_subset@1)/2	77.5	—	Unverified
Fashion IQ	CASE	(Recall@10+Recall@50)/2	59.73	—	Unverified
LaSCo	CASE	Recall@1 (%)	7.08	—	Unverified
LaSCo	BLIP4CIR	Recall@1 (%)	4.26	—	Unverified

Data Roaming and Quality Assessment for Composed Image Retrieval

Code

Abstract

Tasks

Benchmark Results

Reproductions