CLIP-Based Modality Compensation for Visible-Infrared Image Re-Identification
Gang Hu; Yafei Lv; Jianting Zhang; Qian Wu; Zaidao Wen
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Visible-infrared image re-identification (VIReID) aims to match objects with the same identity appearing across different modalities. Given the significant differences between visible and infrared images, VIReID poses a formidable challenge. Most existing methods focus on extracting modality-shared features while ignore modality-specific features, which often also contain crucial important discriminative information. In addition, high-level semantic information of the objects, such as shape and appearance, is also crucial for the VIReID task. To further enhance the retrieval performance, we propose a novel one-stage CLIP-based Modality Compensation (CLIP-MC) method for the VIReID task. Our method introduces a new prompt learning paradigm that leverages the semantic understanding capabilities of CLIP to recover missing modality information. CLIP-MC comprises three key modules: Instance Text Prompt Generation (ITPG), Modality Compensation (MC), and Modality Context Learner (MCL). Specifically, the ITPG module facilitates effective alignment and interaction between image tokens and text tokens, enhancing the text encoder's ability to capture detailed visual information from the images. This ensures that the text encoder generates fine-grained descriptions of the images. The MCL module captures the unique information of each modality and generates modality-specific context tokens, which are more flexible compared to fixed text descriptions. Guided by the modality-specific context, the text encoder discovers missing modality information from the images and produces compensated modality features. Finally, the MC module combines the original and compensated modality features to obtain complete modality features that contain more discriminative information. We conduct extensive experiments on three VIReID datasets and compare the performance of our method with other existing approaches to demonstrate its effectiveness and superiority.