LeAD-M3D: Leveraging Asymmetric Distillation for Real-Time Monocular 3D Detection

2026-03-16Unverified0· sign in to hype

Johannes Meier, Jonathan Michel, Oussema Dhaouadi, Yung-Hsu Yang, Christoph Reich, Zuria Bauer, Stefan Roth, Marc Pollefeys, Jacques Kaiser, Daniel Cremers

arXiv PDF

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Real-time monocular 3D object detection remains challenging due to severe depth ambiguity, viewpoint shifts, and the high computational cost of 3D reasoning. Existing approaches either rely on LiDAR or geometric priors to compensate for missing depth or sacrifice efficiency to achieve competitive accuracy. We introduce LeAD-M3D, a monocular 3D detector that achieves state-of-the-art accuracy and real-time inference without extra modalities. Our method is enabled by three key components. Asymmetric Augmentation Denoising Distillation (A2D2) transfers geometric knowledge from a clean-image teacher to a MixUp-noised student via a quality- and importance-weighted depth-feature loss, enabling stronger depth reasoning without LiDAR. 3D-aware Consistent Matching (CM_3D) improves prediction-to-ground truth assignment by integrating 3D MGIoU into the matching score, yielding stable and precise supervision. Finally, Confidence-Gated 3D Inference (CGI_3D) accelerates inference by restricting expensive 3D regression to confident regions. Together, these contributions set a new Pareto frontier for monocular 3D detection: LeAD-M3D achieves state-of-the-art accuracy on KITTI and Waymo, and the best reported car AP on Rope3D, while running up to 3.6\, faster than prior high-accuracy models (e.g., MonoDiff). LeAD-M3D demonstrates that high fidelity and real-time monocular 3D detection is simultaneously attainable, without LiDAR, stereo, or strong geometric assumptions.

LeAD-M3D: Leveraging Asymmetric Distillation for Real-Time Monocular 3D Detection

Abstract

Reproductions