| Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems | Jul 17, 2024 | Autonomous Web NavigationDenoising | CodeCode Available | 5 | 5 |
| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 | 5 |
| SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation | Jan 24, 2024 | text-to-speechText to Speech | CodeCode Available | 5 | 5 |
| MMBench: Is Your Multi-modal Model an All-around Player? | Jul 12, 2023 | AllInstruction Following | CodeCode Available | 5 | 5 |
| TAPVid-3D: A Benchmark for Tracking Any Point in 3D | Jul 8, 2024 | Point Tracking | CodeCode Available | 5 | 5 |
| Retrieval-Augmented Generation for AI-Generated Content: A Survey | Feb 29, 2024 | Information RetrievalLarge Language Model | CodeCode Available | 5 | 5 |
| Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models | Sep 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| Improved Distribution Matching Distillation for Fast Image Synthesis | May 23, 2024 | Image Generation | CodeCode Available | 5 | 5 |
| Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Jan 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 5 | 5 |
| Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Jun 10, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 5 | 5 |
| Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | Mar 20, 2024 | Image to Video GenerationText-to-Video Generation | CodeCode Available | 5 | 5 |
| HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | Feb 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models | Jun 9, 2024 | Instruction Following | CodeCode Available | 5 | 5 |
| VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jun 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 5 | 5 |
| Diffusion for World Modeling: Visual Details Matter in Atari | May 20, 2024 | Image Generationreinforcement-learning | CodeCode Available | 5 | 5 |
| Flashlight: Enabling Innovation in Tools for Machine Learning | Jan 29, 2022 | BIG-bench Machine Learning | CodeCode Available | 5 | 5 |
| Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models | Jan 1, 2024 | Code Generationparameter-efficient fine-tuning | CodeCode Available | 5 | 5 |
| Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | Oct 9, 2024 | DenoisingImage Generation | CodeCode Available | 5 | 5 |
| BootsTAP: Bootstrapped Training for Tracking-Any-Point | Feb 1, 2024 | Point Tracking | CodeCode Available | 5 | 5 |
| BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset | May 14, 2025 | Image Generation | CodeCode Available | 5 | 5 |
| An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | Aug 2, 2022 | Image GenerationPersonalized Image Generation | CodeCode Available | 5 | 5 |
| ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Jun 26, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 | 5 |
| OffsetBias: Leveraging Debiased Data for Tuning Evaluators | Jul 9, 2024 | | CodeCode Available | 5 | 5 |
| Meta-World+: An Improved, Standardized, RL Benchmark | May 16, 2025 | Meta Reinforcement Learningreinforcement-learning | CodeCode Available | 5 | 5 |
| MONAI: An open-source framework for deep learning in healthcare | Nov 4, 2022 | Deep LearningMedical Image Classification | CodeCode Available | 5 | 5 |