| RobustSAM: Segment Anything Robustly on Degraded Images | Jun 13, 2024 | DeblurringImage Dehazing | CodeCode Available | 3 |
| SMART: Scalable Multi-agent Real-time Motion Generation via Next-token Prediction | May 24, 2024 | Autonomous DrivingMotion Generation | CodeCode Available | 3 |
| MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts | May 2, 2024 | Combinatorial OptimizationMixture-of-Experts | CodeCode Available | 3 |
| IEPile: Unearthing Large-Scale Schema-Based Information Extraction Corpus | Feb 22, 2024 | Zero-shot Generalization | CodeCode Available | 3 |
| 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations | Feb 18, 2024 | DenoisingRobot Manipulation | CodeCode Available | 3 |
| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting | Oct 12, 2023 | DecoderProbabilistic Time Series Forecasting | CodeCode Available | 3 |
| Separate Anything You Describe | Aug 9, 2023 | Audio Source SeparationNatural Language Queries | CodeCode Available | 3 |
| Objaverse-XL: A Universe of 10M+ 3D Objects | Jul 11, 2023 | DiversityNovel View Synthesis | CodeCode Available | 3 |
| What Language Model to Train if You Have One Million GPU Hours? | Oct 27, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 |
| WAFT: Warping-Alone Field Transforms for Optical Flow | Jun 26, 2025 | Optical Flow EstimationZero-shot Generalization | CodeCode Available | 2 |
| RecGPT: A Foundation Model for Sequential Recommendation | Jun 6, 2025 | Decodermodel | CodeCode Available | 2 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization | May 21, 2025 | Vision-Language-ActionZero-shot Generalization | CodeCode Available | 2 |
| SAM2MOT: A Novel Paradigm of Multi-Object Tracking by Segmentation | Apr 6, 2025 | Multi-Object TrackingObject | CodeCode Available | 2 |
| Delineate Anything: Resolution-Agnostic Field Boundary Delineation on Satellite Imagery | Apr 3, 2025 | Field Boundary DelineationInstance Segmentation | CodeCode Available | 2 |
| Q-Insight: Understanding Image Quality via Visual Reinforcement Learning | Mar 28, 2025 | DescriptiveImage Quality Assessment | CodeCode Available | 2 |
| Bokehlicious: Photorealistic Bokeh Rendering with Controllable Apertures | Mar 20, 2025 | DeblurringZero-shot Generalization | CodeCode Available | 2 |
| Autoregressive Image Generation with Randomized Parallel Decoding | Mar 13, 2025 | Conditional Image GenerationImage Generation | CodeCode Available | 2 |
| Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter | Mar 12, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning | Dec 17, 2024 | Denoising | CodeCode Available | 2 |
| Collaborative Decoding Makes Visual Auto-Regressive Modeling Efficient | Nov 26, 2024 | GPUImage Generation | CodeCode Available | 2 |