SOTAVerified

Zero-Shot Composed Image Retrieval (ZS-CIR)

Given a query composed of a reference image and a relative caption, Composed Image Retrieval (CIR) aims to retrieve target images that are visually similar to the reference one but incorporate the changes specified in the relative caption. The bi-modality of the query provides users with more precise control over the characteristics of the desired image, as some features are more easily described with language, while others can be better expressed visually.

Zero-Shot Composed Image Retrieval (ZS-CIR) is a subtask of CIR that aims to design an approach that manages to combine the reference image and the relative caption without the need for supervised learning.

Papers

Showing 136 of 36 papers

TitleStatusHype
MegaPairs: Massive Data Synthesis For Universal Multimodal RetrievalCode3
MagicLens: Self-Supervised Image Retrieval with Open-Ended InstructionsCode3
Reason-before-Retrieve: One-Stage Reflective Chain-of-Thoughts for Training-Free Zero-Shot Composed Image RetrievalCode2
Semantic Editing Increment Benefits Zero-Shot Composed Image RetrievalCode2
LDRE: LLM-based Divergent Reasoning and Ensemble for Zero-Shot Composed Image RetrievalCode2
An Efficient Post-hoc Framework for Reducing Task Discrepancy of Text Encoders for Composed Image RetrievalCode2
Composed Image Retrieval for Remote SensingCode2
iSEARLE: Improving Textual Inversion for Zero-Shot Composed Image RetrievalCode2
CoLLM: A Large Language Model for Composed Image RetrievalCode1
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image RetrievalCode1
ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective ReasoningCode1
Composed Image Retrieval for Training-Free Domain ConversionCode1
Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and NegativesCode1
Knowledge-Enhanced Dual-stream Zero-shot Composed Image RetrievalCode1
Language-only Efficient Training of Zero-shot Composed Image RetrievalCode1
Vision-by-Language for Training-Free Compositional Image RetrievalCode1
Context-I2W: Mapping Images to Context-dependent Words for Accurate Zero-Shot Composed Image RetrievalCode1
CoVR-2: Automatic Data Construction for Composed Video RetrievalCode1
Zero-shot Composed Text-Image RetrievalCode1
Zero-Shot Composed Image Retrieval with Textual InversionCode1
CompoDiff: Versatile Composed Image Retrieval With Latent DiffusionCode1
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image RetrievalCode1
"This is my unicorn, Fluffy": Personalizing frozen vision-language representationsCode1
MLLM-Guided VLM Fine-Tuning with Joint Inference for Zero-Shot Composed Image Retrieval0
Multimodal Reasoning Agent for Zero-Shot Composed Image Retrieval0
Data-Efficient Generalization for Zero-shot Composed Image Retrieval0
CoTMR: Chain-of-Thought Multi-Scale Reasoning for Training-Free Zero-Shot Composed Image Retrieval0
PDV: Prompt Directional Vectors for Zero-shot Composed Image Retrieval0
SCOT: Self-Supervised Contrastive Pretraining For Zero-Shot Compositional Retrieval0
Imagine and Seek: Improving Composed Image Retrieval with an Imagined Proxy0
MoTaDual: Modality-Task Dual Alignment for Enhanced Zero-shot Composed Image Retrieval0
Denoise-I2W: Mapping Images to Denoising Words for Accurate Zero-Shot Composed Image RetrievalCode0
Training-free Zero-shot Composed Image Retrieval via Weighted Modality Fusion and SimilarityCode0
Training-free Zero-shot Composed Image Retrieval with Local Concept Reranking0
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image RetrievalCode0
GeneCIS: A Benchmark for General Conditional Image Similarity0
Show:102550

No leaderboard results yet.