SOTAVerified

CH3Depth: Efficient and Flexible Depth Foundation Model with Flow Matching

2025-01-01CVPR 2025Unverified0· sign in to hype

Jiaqi Li, Yiran Wang, Jinghong Zheng, JunRui Zhang, Liao Shen, Tianqi Liu, Zhiguo Cao

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Depth estimation is a fundamental task in 3D vision. An ideal depth estimation model is expected to embrace meticulous detail, temporal consistency, and high efficiency. Although existing foundation models can perform well in certain specific aspects, most of them fall short of fulfilling all the above requirements simultaneously. In this paper, we present CH_3Depth, an efficient and flexible model for depth estimation with flow matching to address this challenge. Specifically, 1) we reframe the optimization objective of flow matching as the Inversion by Direct Iteration (InDI) to improve accuracy. 2) To enhance efficiency, we propose non-uniform sampling to achieve better prediction with fewer sampling steps. 3) We design the Latent Temporal Stabilizer (LTS) to enhance temporal consistency by aggregating latent codes of adjacent frames, enabling our method to be lightweight and compatible for video depth estimation. CH_3Depth achieves state-of-the-art performance in zero-shot evaluations across multiple image and video datasets, excelling in prediction accuracy, efficiency, and temporal consistency, highlighting its potential as the next foundation model for depth estimation.

Tasks

Reproductions