RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability

2025-01-01CVPR 2025Unverified0· sign in to hype

Minh Kha Do, Kang Han, Phu Lai, Khoa T. Phan, Wei Xiang

Unverified — Be the first to reproduce this paper.

Abstract

Foundation models for remote sensing have garnered increasing attention for their strong performance across various observation tasks. However, current models lack robustness in managing diverse input types and handling incomplete data in downstream tasks. In this paper, we propose RobSense, a robust multi-modal foundation model for Multi-spectral and Synthetic Aperture Radar data. RobSense is designed with modular components and pre-trained by a combination of temporal multi-modal alignment and masked autoencoder strategies on a huge-scale dataset. Therefore, it can effectively support diverse input types, from static to temporal, uni-modal to multi-modal. To further handle the incomplete data, we incorporate two uni-modal latent reconstructors that recover rich representations from incomplete inputs, addressing variability in spectral bands and temporal sequence irregularities. Extensive experiments demonstrate that RobSense consistently outperforms state-of-the-art baselines on complete datasets across four input types for segmentation, classification, and change detection. On incomplete datasets, RobSense outperforms the baselines by considerably larger margins when the missing rate increases. Project page: https://ikhado.github.io/robsense/

Tasks

Change Detection

RobSense: A Robust Multi-modal Foundation Model for Remote Sensing with Static, Temporal, and Incomplete Data Adaptability

Abstract

Tasks

Reproductions