Structural feature enhanced transformer for fine-grained image recognition

2025-06-14Pattern Recognition 2025Unverified0· sign in to hype

Ying Yu, Wei Wei, Cairong Zhao, Jin Qian, Enhong Chen

Unverified — Be the first to reproduce this paper.

Abstract

Existing fine-grained image recognition (FGIR) models mainly rely on high-level semantic features to extract discriminative information, ignoring the potential role of the overall structural information of objects and the structural relationships between key parts. To address this issue, we propose the Structural Feature Enhancement Transformer (SFETrans). SFETrans consists of a visual transformer backbone network responsible for extracting complex semantic features. Additionally, it includes a structural modeling (SM) branch and an amplitude component exchange (ACE) module, both dedicated to enhancing the learning of structural features. The SM branch actively models the structural relationships between key parts of objects and extracts corresponding structural features, while the ACE module guides the model to learn structural information in the phase spectrum by introducing implicit constraints during training. By synergizing the backbone network and the two modules, SFETrans exhibits competitive performance on four benchmark datasets and outperforms other comparison methods in terms of computational efficiency.

Tasks

Computational Efficiency Fine-Grained Image Classification Fine-Grained Image Recognition

Structural feature enhanced transformer for fine-grained image recognition

Abstract

Tasks

Reproductions