| Agent-E: From Autonomous Web Navigation to Foundational Design Principles in Agentic Systems | Jul 17, 2024 | Autonomous Web NavigationDenoising | CodeCode Available | 5 | 5 |
| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 | 5 |
| SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation | Jan 24, 2024 | text-to-speechText to Speech | CodeCode Available | 5 | 5 |
| MMBench: Is Your Multi-modal Model an All-around Player? | Jul 12, 2023 | AllInstruction Following | CodeCode Available | 5 | 5 |
| TAPVid-3D: A Benchmark for Tracking Any Point in 3D | Jul 8, 2024 | Point Tracking | CodeCode Available | 5 | 5 |
| Retrieval-Augmented Generation for AI-Generated Content: A Survey | Feb 29, 2024 | Information RetrievalLarge Language Model | CodeCode Available | 5 | 5 |
| Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models | Sep 21, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| Improved Distribution Matching Distillation for Fast Image Synthesis | May 23, 2024 | Image Generation | CodeCode Available | 5 | 5 |
| Large Language Model based Multi-Agents: A Survey of Progress and Challenges | Jan 21, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 5 | 5 |
| Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation | Jun 10, 2024 | Conditional Image GenerationImage Generation | CodeCode Available | 5 | 5 |
| Mora: Enabling Generalist Video Generation via A Multi-Agent Framework | Mar 20, 2024 | Image to Video GenerationText-to-Video Generation | CodeCode Available | 5 | 5 |
| HealthGPT: A Medical Large Vision-Language Model for Unifying Comprehension and Generation via Heterogeneous Knowledge Adaptation | Feb 14, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models | Jun 9, 2024 | Instruction Following | CodeCode Available | 5 | 5 |
| VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks | Jun 12, 2024 | Image GenerationLanguage Modeling | CodeCode Available | 5 | 5 |
| Diffusion for World Modeling: Visual Details Matter in Atari | May 20, 2024 | Image Generationreinforcement-learning | CodeCode Available | 5 | 5 |
| Flashlight: Enabling Innovation in Tools for Machine Learning | Jan 29, 2022 | BIG-bench Machine Learning | CodeCode Available | 5 | 5 |
| Astraios: Parameter-Efficient Instruction Tuning Code Large Language Models | Jan 1, 2024 | Code Generationparameter-efficient fine-tuning | CodeCode Available | 5 | 5 |
| Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think | Oct 9, 2024 | DenoisingImage Generation | CodeCode Available | 5 | 5 |
| BootsTAP: Bootstrapped Training for Tracking-Any-Point | Feb 1, 2024 | Point Tracking | CodeCode Available | 5 | 5 |
| BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset | May 14, 2025 | Image Generation | CodeCode Available | 5 | 5 |
| An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion | Aug 2, 2022 | Image GenerationPersonalized Image Generation | CodeCode Available | 5 | 5 |
| ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation | Jun 26, 2024 | Text-to-Video GenerationVideo Generation | CodeCode Available | 5 | 5 |
| OffsetBias: Leveraging Debiased Data for Tuning Evaluators | Jul 9, 2024 | | CodeCode Available | 5 | 5 |
| Meta-World+: An Improved, Standardized, RL Benchmark | May 16, 2025 | Meta Reinforcement Learningreinforcement-learning | CodeCode Available | 5 | 5 |
| MONAI: An open-source framework for deep learning in healthcare | Nov 4, 2022 | Deep LearningMedical Image Classification | CodeCode Available | 5 | 5 |
| Secrets of RLHF in Large Language Models Part II: Reward Modeling | Jan 11, 2024 | Contrastive LearningMeta-Learning | CodeCode Available | 5 | 5 |
| BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion | Mar 11, 2024 | Image Inpainting | CodeCode Available | 5 | 5 |
| Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers | Apr 27, 2025 | HallucinationQuestion Answering | CodeCode Available | 5 | 5 |
| WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit | Mar 29, 2022 | DecoderLanguage Modelling | CodeCode Available | 5 | 5 |
| WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models | Jan 25, 2024 | | CodeCode Available | 5 | 5 |
| MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering | Oct 9, 2024 | | CodeCode Available | 5 | 5 |
| Free Process Rewards without Process Labels | Dec 2, 2024 | Math | CodeCode Available | 5 | 5 |
| VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | Jun 11, 2024 | Multiple-choiceQuestion Answering | CodeCode Available | 5 | 5 |
| Executable Code Actions Elicit Better LLM Agents | Feb 1, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 | 5 |
| InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation | Feb 28, 2025 | Audio GenerationForm | CodeCode Available | 5 | 5 |
| PatchRefiner: Leveraging Synthetic Data for Real-Domain High-Resolution Monocular Metric Depth Estimation | Jun 10, 2024 | 3D ReconstructionAutonomous Driving | CodeCode Available | 5 | 5 |
| ZoeDepth: Zero-shot Transfer by Combining Relative and Metric Depth | Feb 23, 2023 | Depth EstimationMonocular Depth Estimation | CodeCode Available | 5 | 5 |
| Continuous Thought Machines | May 8, 2025 | Computational EfficiencyQuestion Answering | CodeCode Available | 5 | 5 |
| OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | Jan 29, 2024 | DecoderMixture-of-Experts | CodeCode Available | 5 | 5 |
| MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining | May 12, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| Efficient Streaming Language Models with Attention Sinks | Sep 29, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 5 | 5 |
| OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs | Nov 21, 2024 | Retrieval | CodeCode Available | 5 | 5 |
| Group-in-Group Policy Optimization for LLM Agent Training | May 16, 2025 | GPUMathematical Reasoning | CodeCode Available | 5 | 5 |
| Sequencer: Deep LSTM for Image Classification | May 4, 2022 | Domain Generalizationimage-classification | CodeCode Available | 5 | 5 |
| FunReason: Enhancing Large Language Models' Function Calling via Self-Refinement Multiscale Loss and Automated Data Refinement | May 26, 2025 | | CodeCode Available | 5 | 5 |
| Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents | May 29, 2025 | Meta-Learning | CodeCode Available | 5 | 5 |
| EvoGit: Decentralized Code Evolution via Git-Based Multi-Agent Collaboration | Jun 1, 2025 | | CodeCode Available | 5 | 5 |
| Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models | Jun 5, 2025 | RerankingRetrieval | CodeCode Available | 5 | 5 |
| SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models | Jun 15, 2025 | Logical ReasoningReinforcement Learning (RL) | CodeCode Available | 5 | 5 |
| Matrix-Game: Interactive World Foundation Model | Jun 23, 2025 | Minecraftmodel | CodeCode Available | 5 | 5 |