SOTAVerified

Zero-Shot Composed Image Retrieval (ZS-CIR)

Given a query composed of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images that are visually similar to the reference one but incorporate the changes specified in the relative caption. The bi-modality of the query provides users with more precise control over the characteristics of the desired image, as some features are more easily described with language, while others can be better expressed visually.

Zero-Shot Composed Image Retrieval (ZS-CIR) is a subtask of CIR that aims to design an approach that manages to combine the reference image and the relative caption without the need for supervised learning.

Papers

Showing 3136 of 36 papers

TitleStatusHype
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval0
PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval0
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval0
GeneCIS: A Benchmark for General Conditional Image Similarity0
SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval0
Show:102550
← PrevPage 4 of 4Next →

No leaderboard results yet.