EarthBridge: A Solution for 4th Multi-modal Aerial View Image Challenge Translation Track
Zhenyuan Chen, Guanyuan Shen, Feng Zhang
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/bili-sakura/earthbridge-previewOfficialIn paper★ 2
Abstract
Cross-modal image-to-image translation among Electro-Optical (EO), Infrared (IR), and Synthetic Aperture Radar (SAR) sensors is essential for comprehensive multi-modal aerial-view analysis. However, translating between these modalities is notoriously difficult due to their distinct electromagnetic signatures and geometric characteristics. This paper presents EarthBridge, a high-fidelity translation framework developed for the 4th Multi-modal Aerial View Image Challenge -- Translation (MAVIC-T). We explore two distinct methodologies: Diffusion Bridge Implicit Models (DBIM), which we generalize using non-Markovian bridge processes for high-quality deterministic sampling, and Contrastive Unpaired Translation (CUT), which utilizes contrastive learning for structural consistency. Our EarthBridge framework employs a channel-concatenated UNet denoiser trained with Karras-weighted bridge scalings and a specialized "booting noise" initialization to handle the inherent ambiguity in cross-modal mappings. We evaluate these methods across all four challenge tasks (SAREO, SARRGB, SARIR, RGBIR), achieving superior spatial detail and spectral accuracy. Our solution achieved a composite score of 0.38, securing the second position on the MAVIC-T leaderboard. Code is available at https://github.com/Bili-Sakura/EarthBridge-Preview.