| FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Jan 2, 2025 | GPUScheduling | CodeCode Available | 9 | 5 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 | 5 |
| Steering Language Models with Game-Theoretic Solvers | Jan 24, 2024 | Imitation LearningScheduling | CodeCode Available | 9 | 5 |
| The Road Less Scheduled | May 24, 2024 | Scheduling | CodeCode Available | 7 | 5 |
| FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Nov 27, 2024 | FairnessGPU | CodeCode Available | 7 | 5 |
| Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models | Feb 6, 2023 | Scheduling | CodeCode Available | 7 | 5 |
| AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance | Jun 4, 2025 | BenchmarkingScheduling | CodeCode Available | 5 | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 | 5 |
| FlowTok: Flowing Seamlessly Across Text and Image Tokens | Mar 13, 2025 | DenoisingImage to text | CodeCode Available | 5 | 5 |
| Orion-14B: Open-source Multilingual Large Language Models | Jan 20, 2024 | Scheduling | CodeCode Available | 4 | 5 |
| Vidur: A Large-Scale Simulation Framework For LLM Inference | May 8, 2024 | CPUGPU | CodeCode Available | 4 | 5 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 | 5 |
| One Step Diffusion via Shortcut Models | Oct 16, 2024 | DenoisingScheduling | CodeCode Available | 4 | 5 |
| PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices | May 30, 2024 | Scheduling | CodeCode Available | 4 | 5 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 | 5 |
| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | Jan 25, 2024 | GPUScheduling | CodeCode Available | 4 | 5 |
| Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | Oct 3, 2024 | Scheduling | CodeCode Available | 3 | 5 |
| MNN: A Universal and Efficient Inference Engine | Feb 27, 2020 | Deep LearningDiversity | CodeCode Available | 3 | 5 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 | 5 |
| Fairness in Serving Large Language Models | Dec 31, 2023 | FairnessScheduling | CodeCode Available | 3 | 5 |
| FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering | Aug 15, 2024 | Computational EfficiencyScheduling | CodeCode Available | 3 | 5 |
| Efficiently Serving LLM Reasoning Programs with Certaindex | Dec 30, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 3 | 5 |
| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| A Survey on Large Language Model Acceleration based on KV Cache Management | Dec 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System | Apr 23, 2020 | Scheduling | CodeCode Available | 3 | 5 |
| Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve | Mar 4, 2024 | GPUScheduling | CodeCode Available | 3 | 5 |
| Vine Copulas as Differentiable Computational Graphs | Jun 16, 2025 | GPUScheduling | CodeCode Available | 3 | 5 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 | 5 |
| Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule | May 12, 2025 | Drug DesignScheduling | CodeCode Available | 2 | 5 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 | 5 |
| MineLand: Simulating Large-Scale Multi-Agent Interactions with Limited Multimodal Senses and Physical Needs | Mar 28, 2024 | AI AgentMinecraft | CodeCode Available | 2 | 5 |
| NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing | Jun 10, 2024 | SchedulingVideo Editing | CodeCode Available | 2 | 5 |
| Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services | Jun 27, 2024 | Scheduling | CodeCode Available | 2 | 5 |
| Learning to Solve Job Shop Scheduling under Uncertainty | Mar 4, 2024 | Combinatorial OptimizationDeep Reinforcement Learning | CodeCode Available | 2 | 5 |
| MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning | Aug 20, 2024 | DenoisingImage Generation | CodeCode Available | 2 | 5 |
| Preble: Efficient Distributed Prompt Scheduling for LLM Serving | May 8, 2024 | GPUScheduling | CodeCode Available | 2 | 5 |
| Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow | Jun 3, 2024 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World | Mar 31, 2025 | Robot ManipulationScheduling | CodeCode Available | 2 | 5 |
| Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs | Oct 18, 2022 | Deep LearningScheduling | CodeCode Available | 2 | 5 |
| FlagVNE: A Flexible and Generalizable Reinforcement Learning Framework for Network Resource Allocation | Apr 19, 2024 | DecoderNetwork Embedding | CodeCode Available | 2 | 5 |
| ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering | Jun 10, 2025 | Scheduling | CodeCode Available | 2 | 5 |
| Human-in-the-Loop Large-Scale Predictive Maintenance of Workstations | Jun 23, 2022 | Active LearningScheduling | CodeCode Available | 2 | 5 |
| Efficient LLM Scheduling by Learning to Rank | Aug 28, 2024 | BlockingChatbot | CodeCode Available | 2 | 5 |
| ChaCha for Online AutoML | Jun 9, 2021 | AutoMLScheduling | CodeCode Available | 2 | 5 |
| EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting | Jun 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| BMInf: An Efficient Toolkit for Big Model Inference and Tuning | May 1, 2022 | CPUGPU | CodeCode Available | 2 | 5 |
| ElegantRL-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning | Dec 11, 2021 | Deep Reinforcement LearningGPU | CodeCode Available | 2 | 5 |
| Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents | May 17, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent) | Jan 16, 2024 | Scheduling | CodeCode Available | 2 | 5 |
| AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms | Feb 21, 2025 | Scheduling | CodeCode Available | 2 | 5 |