| Zero-1-to-3: Zero-shot One Image to 3D Object | Mar 20, 2023 | 3D ReconstructionImage to 3D | CodeCode Available | 4 |
| MonSter: Marry Monodepth to Stereo Unleashes Power | Jan 15, 2025 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 4 |
| Parameter-Efficient Prompt Tuning Makes Generalized and Calibrated Neural Text Retrievers | Jul 14, 2022 | RetrievalText Retrieval | CodeCode Available | 4 |
| Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models | Apr 15, 2025 | Humanoid ControlReinforcement Learning (RL) | CodeCode Available | 4 |
| Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction | Sep 26, 2024 | 3D ReconstructionDenoising | CodeCode Available | 4 |
| Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation | Dec 4, 2023 | Depth EstimationGPU | CodeCode Available | 4 |
| Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image | Jul 20, 2023 | Depth EstimationImage Reconstruction | CodeCode Available | 4 |
| Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement | Mar 9, 2025 | Domain GeneralizationObject Detection | CodeCode Available | 4 |
| CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up | Dec 20, 2024 | 8kGPU | CodeCode Available | 3 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |