SOTAVerified

HSPFormer: Hierarchical Spatial Perception Transformer for Semantic Segmentation

2025-01-16IEEE Transactions on Intelligent Transportation Systems 2025Code Available1· sign in to hype

Siyu Chen, Ting Han, Changshe Zhang, Jinhe Su, Ruisheng Wang, Yiping Chen, Zongyue Wang, Guorong Cai

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Semantic perception in driving scenarios plays a crucial role in intelligent transportation systems. However, existing Transformer-based semantic segmentation methods often do not fully exploit their potential in understanding driving scene dynamically. These methods typically lack spatial reasoning, failing to effectively correlate image pixels with their spatial positions, leading to attention drift. To address this issue, we propose a novel architecture, the Hierarchical Spatial Perception Transformer (HSPFormer), which integrates monocular depth estimation and semantic segmentation into a unified framework for the first time. We introduce the Spatial Depth Perception Auxiliary Network (SDPNet), a framework for multiscale feature extraction and multilayer depth map prediction to establish hierarchical spatial coherence. Additionally, we design the Hierarchical Pyramid Transformer Network (HPTNet), which uses depth estimation as learnable position embeddings to form spatially correlated semantic representations and generate global contextual information. Experiments on benchmark datasets such as KITTI-360, Cityscapes, and NYU Depth V2, demonstrate that HSPFormer outperforms several state-of-the-art networks, and achieves promising performance with 66.82% top-1 mIoU on KITTI-360, 83.8% mIoU on Cityscapes, and 57.7% mIoU on NYU Depth V2, respectively. The code will be made publicly available at https://github.com/SY-Ch/HSPFormer.

Tasks

Reproductions