| Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model | Mar 8, 2025 | Image Quality AssessmentLanguage Modeling | CodeCode Available | 2 |
| Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter | Mar 12, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| RecGPT: A Foundation Model for Sequential Recommendation | Jun 6, 2025 | Decodermodel | CodeCode Available | 2 |
| Multitask Prompted Training Enables Zero-Shot Task Generalization | Oct 15, 2021 | BenchmarkingDecoder | CodeCode Available | 2 |
| DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment | Jul 3, 2025 | cross-modal alignmentInstruction Following | CodeCode Available | 2 |
| Matryoshka Diffusion Models | Oct 23, 2023 | Image GenerationZero-shot Generalization | CodeCode Available | 2 |
| Detecting Everything in the Open World: Towards Universal Object Detection | Mar 21, 2023 | object-detectionObject Detection | CodeCode Available | 2 |
| Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression | May 26, 2025 | Zero-shot Generalization | CodeCode Available | 2 |
| Learning to Route Among Specialized Experts for Zero-Shot Generalization | Feb 8, 2024 | parameter-efficient fine-tuningZero-shot Generalization | CodeCode Available | 2 |
| LLM+P: Empowering Large Language Models with Optimal Planning Proficiency | Apr 22, 2023 | Zero-shot Generalization | CodeCode Available | 2 |