CAFF-DINO: Multi-spectral object detection transformers with cross-attention features fusion

2024-09-27IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2024Unverified0· sign in to hype

Kevin Helvig, Baptiste Abeloos, Pauline Trouve-Peloux

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Object detection on images can find benefit from coupling multiple spectra, each presenting specific useful features. However, building an efficient architecture coupling the different modalities is a complex task. Transformers, due to their ability to extract meaningful correlations between the different regions of the inputs appear as a promising way to perform features fusion across different spectra. This work presents a multi-spectral object detection architecture based on cross-attention features fusion (CAFF), combined with a transformer based detector (DINO). We demonstrate here the performance of the proposed approach in object detection compared with state-of-the-art approaches, on infrared-visible multi-spectral datasets. Moreover the robustness to systematic misalignment between image pairs is studied. The proposed approach is generic to any mono-spectrum transformer based detectors. The model developed in this study will be available in a dedicated github repository.

Tasks

Multispectral Object Detection Object object-detection Object Detection Pedestrian Detection

CAFF-DINO: Multi-spectral object detection transformers with cross-attention features fusion

Abstract

Tasks

Reproductions