A Channel Mix Method for Fine-Grained Cross-Modal Retrieval
Yang shen, Xuhao Sun, Xiu-Shen Wei, Hanxu Hu, Zhipeng Chen
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/msfuxian/A_CHANNEL_MIX_METHODmindspore★ 2
- github.com/2023-MindSpore-1/ms-code-103mindspore★ 0
- github.com/2023-MindSpore-1/ms-code-10mindspore★ 0
Abstract
In this paper, we propose a simple but effective method for dealing with the challenging fine-grained cross-modal retrieval task where it aims to enable flexible retrieval among subor-dinate categories across different modalities. Specifically, in order to enhance information interaction in different modalities for fine-grained objects, a channel mix method is developed and performed upon the channels of deep activations across dif-ferent modalities. After that, a 1 x 1 convolution is employed to aggregate the mixed channels into a unified feature vector. Moreover, equipped with a novel fine-grained cross-modal cen-ter loss, our method can further improve the intra-class separa-bility as well as inter-class compactness for multi-modalities. Experiments are conducted on the fine-grained cross-modal benchmark dataset and show our superiority over competing methods. Meanwhile, ablation studies also demonstrate the effectiveness of our proposals.