AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation
Kashu Yamazaki, Taisei Hanyu, Minh Tran, Adrian de Luis, Roy McCann, Haitao Liao, Chase Rainwater, Meredith Adkins, Jackson Cothren, Ngan Le
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/UARK-AICV/AerialFormerOfficialnone★ 71
Abstract
Aerial Image Segmentation is a top-down perspective semantic segmentation and has several challenging characteristics such as strong imbalance in the foreground-background distribution, complex background, intra-class heterogeneity, inter-class homogeneity, and tiny objects. To handle these problems, we inherit the advantages of Transformers and propose AerialFormer, which unifies Transformers at the contracting path with lightweight Multi-Dilated Convolutional Neural Networks (MD-CNNs) at the expanding path. Our AerialFormer is designed as a hierarchical structure, in which Transformer encoder outputs multi-scale features and MD-CNNs decoder aggregates information from the multi-scales. Thus, it takes both local and global contexts into consideration to render powerful representations and high-resolution segmentation. We have benchmarked AerialFormer on three common datasets including iSAID, LoveDA, and Potsdam. Comprehensive experiments and extensive ablation studies show that our proposed AerialFormer outperforms previous state-of-the-art methods with remarkable performance. Our source code will be publicly available upon acceptance.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| iSAID | AerialFormer-B | mIoU | 69.3 | — | Unverified |
| iSAID | AerialFormer-S | mIoU | 68.4 | — | Unverified |
| iSAID | AerialFormer-T | mIoU | 67.5 | — | Unverified |
| ISPRS Potsdam | AerialFormer-B | Overall Accuracy | 93.9 | — | Unverified |
| LoveDA | AerialFormer-B | Category mIoU | 54.1 | — | Unverified |