| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 | 5 |
| RobustSAM: Segment Anything Robustly on Degraded Images | Jun 13, 2024 | DeblurringImage Dehazing | CodeCode Available | 3 | 5 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 | 5 |
| MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts | May 2, 2024 | Combinatorial OptimizationMixture-of-Experts | CodeCode Available | 3 | 5 |
| DEFOM-Stereo: Depth Foundation Model Based Stereo Matching | Jan 16, 2025 | Depth EstimationDisparity Estimation | CodeCode Available | 3 | 5 |
| Objaverse-XL: A Universe of 10M+ 3D Objects | Jul 11, 2023 | DiversityNovel View Synthesis | CodeCode Available | 3 | 5 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 | 5 |
| Detecting Everything in the Open World: Towards Universal Object Detection | Mar 21, 2023 | object-detectionObject Detection | CodeCode Available | 2 | 5 |
| LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | Apr 22, 2023 | Zero-shot Generalization | CodeCode Available | 2 | 5 |
| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 | 5 |