Cross-modal Retrieval with Improved Graph Convolution
ZHANG Hongtu, HUA Chunjian, JIANG Yi, YU Jianfeng, CHEN Ying
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Aiming at the problem that existing image text cross-modal retrieval is difficult to fully exploit the local consistency in the mode in the common subspace, a cross-modal retrieval method based on improved graph convolution was proposed. In order to improve the local consistency within each mode, the modal diagram is constructed with a single sample as a node, fully mining the interactive information between features. In order to solve the problem that graph convolution network can only do shallow learning, the method of adding initial residual link and weight identity map in each layer of graph convolution is adopted to alleviate this phenomenon. In order to jointly update the central node features through higher-order and lower-order neighbor information, an improvement is proposed to reduce neighbor nodes and increase the number of layers in graph convolution network. In order to learn highly locally consistent and semantically consistent public representation, shared common representation learning layer weight, and jointly optimize the semantic constraints within the modes and the modal invariant constraints between modes in the common subspace. The experimental results show that on the two cross-modal data sets of Wikipedia and Pascal Sentence, the average mAP values of different retrieval tasks are 2.2%-42.1% and 3.0%-54.0% higher than the 11 existing methods.