| General Object Foundation Model for Images and Videos at Scale | Dec 14, 2023 | Instance SegmentationLong-tail Video Object Segmentation | CodeCode Available | 3 |
| Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting | Oct 12, 2023 | DecoderProbabilistic Time Series Forecasting | CodeCode Available | 3 |
| Separate Anything You Describe | Aug 9, 2023 | Audio Source SeparationNatural Language Queries | CodeCode Available | 3 |
| Objaverse-XL: A Universe of 10M+ 3D Objects | Jul 11, 2023 | DiversityNovel View Synthesis | CodeCode Available | 3 |
| What Language Model to Train if You Have One Million GPU Hours? | Oct 27, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| Expanding Language-Image Pretrained Models for General Video Recognition | Aug 4, 2022 | Action ClassificationAction Recognition | CodeCode Available | 3 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 |
| WAFT: Warping-Alone Field Transforms for Optical Flow | Jun 26, 2025 | Optical Flow EstimationZero-shot Generalization | CodeCode Available | 2 |
| RecGPT: A Foundation Model for Sequential Recommendation | Jun 6, 2025 | Decodermodel | CodeCode Available | 2 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 |