Structural feature enhanced transformer for fine-grained image recognition
Ying Yu, Wei Wei, Cairong Zhao, Jin Qian, Enhong Chen
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Existing fine-grained image recognition (FGIR) models mainly rely on high-level semantic features to extract discriminative information, ignoring the potential role of the overall structural information of objects and the structural relationships between key parts. To address this issue, we propose the Structural Feature Enhancement Transformer (SFETrans). SFETrans consists of a visual transformer backbone network responsible for extracting complex semantic features. Additionally, it includes a structural modeling (SM) branch and an amplitude component exchange (ACE) module, both dedicated to enhancing the learning of structural features. The SM branch actively models the structural relationships between key parts of objects and extracts corresponding structural features, while the ACE module guides the model to learn structural information in the phase spectrum by introducing implicit constraints during training. By synergizing the backbone network and the two modules, SFETrans exhibits competitive performance on four benchmark datasets and outperforms other comparison methods in terms of computational efficiency.