SOTAVerified

Monocular Depth Estimation

Monocular Depth Estimation is the task of estimating the depth value (distance relative to the camera) of each pixel given a single (monocular) RGB image. This challenging task is a key prerequisite for determining scene understanding for applications such as 3D scene reconstruction, autonomous driving, and AR. State-of-the-art methods usually fall into one of two categories: designing a complex network that is powerful enough to directly regress the depth map, or splitting the input into bins or windows to reduce computational complexity. The most popular benchmarks are the KITTI and NYUv2 datasets. Models are typically evaluated using RMSE or absolute relative error.

Source: Defocus Deblurring Using Dual-Pixel Data

Papers

Showing 150 of 876 papers

TitleStatusHype
Depth Anything V2Code9
Depth Pro: Sharp Monocular Metric Depth in Less Than a SecondCode9
Depth Anything: Unleashing the Power of Large-Scale Unlabeled DataCode9
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian SplattingCode7
DINOv2: Learning Robust Visual Features without SupervisionCode6
UniDepthV2: Universal Monocular Metric Depth Estimation Made SimplerCode5
DepthCrafter: Generating Consistent Long Depth Sequences for Open-world VideosCode5
ZoeDepth: Zero-shot Transfer by Combining Relative and Metric DepthCode5
UniDepth: Universal Monocular Metric Depth EstimationCode5
Video Depth Anything: Consistent Depth Estimation for Super-Long VideosCode5
DepthFM: Fast Monocular Depth Estimation with Flow MatchingCode4
MonSter: Marry Monodepth to Stereo Unleashes PowerCode4
Metric3D: Towards Zero-shot Metric 3D Prediction from A Single ImageCode4
Fine-Tuning Image-Conditional Diffusion Models is Easier than You ThinkCode4
Repurposing Diffusion-Based Image Generators for Monocular Depth EstimationCode4
Self-Supervised Geometry-Guided Initialization for Robust Monocular Visual OdometryCode4
Distill Any Depth: Distillation Creates a Stronger Monocular Depth EstimatorCode4
UniK3D: Universal Camera Monocular 3D EstimationCode4
Relative Pose Estimation through Affine Corrections of Monocular Depth PriorsCode3
PF3plat: Pose-Free Feed-Forward 3D Gaussian SplattingCode3
iDisc: Internal Discretization for Monocular Depth EstimationCode3
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
ECoDepth: Effective Conditioning of Diffusion Models for Monocular Depth EstimationCode3
GenWarp: Single Image to Novel Views with Semantic-Preserving Generative WarpingCode3
Vision Transformers for Dense PredictionCode3
Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset TransferCode3
What Matters When Repurposing Diffusion Models for General Dense Perception Tasks?Code3
RoadBEV: Road Surface Reconstruction in Bird's Eye ViewCode3
LiftFeat: 3D Geometry-Aware Local Feature MatchingCode3
PrimeDepth: Efficient Monocular Depth Estimation with a Stable Diffusion PreimageCode2
Plane2Depth: Hierarchical Adaptive Plane Guidance for Monocular Depth EstimationCode2
Refinement of Monocular Depth Maps via Multi-View Differentiable RenderingCode2
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution MergingCode2
360MonoDepth: High-Resolution 360deg Monocular Depth EstimationCode2
Physical 3D Adversarial Attacks against Monocular Depth Estimation in Autonomous DrivingCode2
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth EstimationCode2
OSMLoc: Single Image-Based Visual Localization in OpenStreetMap with Fused Geometric and Semantic GuidanceCode2
BinsFormer: Revisiting Adaptive Bins for Monocular Depth EstimationCode2
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous DrivingCode2
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth EstimationCode2
Mono-ViFI: A Unified Learning Framework for Self-supervised Single- and Multi-frame Monocular Depth EstimationCode2
InvPT: Inverted Pyramid Multi-task Transformer for Dense Scene UnderstandingCode2
ImOV3D: Learning Open-Vocabulary Point Clouds 3D Object Detection from Only 2D ImagesCode2
Joint 2D-3D Multi-Task Learning on Cityscapes-3D: 3D Detection, Segmentation, and Depth EstimationCode2
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image PriorsCode2
IDOL: Unified Dual-Modal Latent Diffusion for Human-Centric Joint Video-Depth GenerationCode2
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTVCode2
Enforcing geometric constraints of virtual normal for depth predictionCode2
DurLAR: A High-fidelity 128-channel LiDAR Dataset with Panoramic Ambient and Reflectivity Imagery for Multi-modal Autonomous Driving ApplicationsCode2
Show:102550
← PrevPage 1 of 18Next →

No leaderboard results yet.