Cross-Modality Fusion Transformer for Multispectral Object Detection

2021-10-30Code Available1· sign in to hype

Fang Qingyun, Han Dapeng, Wang Zhaokui

Code Available — Be the first to reproduce this paper.

Code

github.com/docf/multispectral-object-detection
OfficialIn paperpytorch★ 462

Abstract

Multispectral image pairs can provide the combined information, making object detection applications more reliable and robust in the open world. To fully exploit the different modalities, we present a simple yet effective cross-modality feature fusion approach, named Cross-Modality Fusion Transformer (CFT) in this paper. Unlike prior CNNs-based works, guided by the transformer scheme, our network learns long-range dependencies and integrates global contextual information in the feature extraction stage. More importantly, by leveraging the self attention of the transformer, the network can naturally carry out simultaneous intra-modality and inter-modality fusion, and robustly capture the latent interactions between RGB and Thermal domains, thereby significantly improving the performance of multispectral object detection. Extensive experiments and ablation studies on multiple datasets demonstrate that our approach is effective and achieves state-of-the-art detection performance. Our code and models are available at https://github.com/DocF/multispectral-object-detection.

Tasks

Multispectral Object Detection Object object-detection Object Detection Pedestrian Detection

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
FLIR	CFT	mAP50	77.7	—	Unverified
FLIR	YOLOv5 (T)	mAP50	73.9	—	Unverified
FLIR	YOLOv5 (RGB)	mAP50	67.8	—	Unverified
LLVIP	CFT	mAP50	97.5	—	Unverified

Cross-Modality Fusion Transformer for Multispectral Object Detection

Code

Abstract

Tasks

Benchmark Results

Reproductions