| FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Jan 2, 2025 | GPUScheduling | CodeCode Available | 9 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 |
| Steering Language Models with Game-Theoretic Solvers | Jan 24, 2024 | Imitation LearningScheduling | CodeCode Available | 9 |
| FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Nov 27, 2024 | FairnessGPU | CodeCode Available | 7 |
| The Road Less Scheduled | May 24, 2024 | Scheduling | CodeCode Available | 7 |
| Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models | Feb 6, 2023 | Scheduling | CodeCode Available | 7 |
| AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance | Jun 4, 2025 | BenchmarkingScheduling | CodeCode Available | 5 |
| FlowTok: Flowing Seamlessly Across Text and Image Tokens | Mar 13, 2025 | DenoisingImage to text | CodeCode Available | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 |
| One Step Diffusion via Shortcut Models | Oct 16, 2024 | DenoisingScheduling | CodeCode Available | 4 |
| PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices | May 30, 2024 | Scheduling | CodeCode Available | 4 |
| Vidur: A Large-Scale Simulation Framework For LLM Inference | May 8, 2024 | CPUGPU | CodeCode Available | 4 |
| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | Jan 25, 2024 | GPUScheduling | CodeCode Available | 4 |
| Orion-14B: Open-source Multilingual Large Language Models | Jan 20, 2024 | Scheduling | CodeCode Available | 4 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 |
| Vine Copulas as Differentiable Computational Graphs | Jun 16, 2025 | GPUScheduling | CodeCode Available | 3 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| Efficiently Serving LLM Reasoning Programs with Certaindex | Dec 30, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 3 |
| A Survey on Large Language Model Acceleration based on KV Cache Management | Dec 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 |
| Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1 | Oct 3, 2024 | Scheduling | CodeCode Available | 3 |
| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering | Aug 15, 2024 | Computational EfficiencyScheduling | CodeCode Available | 3 |
| Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve | Mar 4, 2024 | GPUScheduling | CodeCode Available | 3 |
| Fairness in Serving Large Language Models | Dec 31, 2023 | FairnessScheduling | CodeCode Available | 3 |
| Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System | Apr 23, 2020 | Scheduling | CodeCode Available | 3 |
| MNN: A Universal and Efficient Inference Engine | Feb 27, 2020 | Deep LearningDiversity | CodeCode Available | 3 |
| SystolicAttention: Fusing FlashAttention within a Single Systolic Array | Jul 15, 2025 | Scheduling | CodeCode Available | 2 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow Matching | Jun 20, 2025 | SchedulingSpeech Synthesis | CodeCode Available | 2 |
| Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning Environment | Jun 10, 2025 | Combinatorial OptimizationImitation Learning | CodeCode Available | 2 |
| ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm Engineering | Jun 10, 2025 | Scheduling | CodeCode Available | 2 |
| Demystifying and Enhancing the Efficiency of Large Language Model Based Search Agents | May 17, 2025 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Piloting Structure-Based Drug Design via Modality-Specific Optimal Schedule | May 12, 2025 | Drug DesignScheduling | CodeCode Available | 2 |
| HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Apr 8, 2025 | CPUGPU | CodeCode Available | 2 |
| AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World | Mar 31, 2025 | Robot ManipulationScheduling | CodeCode Available | 2 |
| AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware Platforms | Feb 21, 2025 | Scheduling | CodeCode Available | 2 |
| Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge | Jan 23, 2025 | SchedulingStreaming video understanding | CodeCode Available | 2 |
| RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World | Nov 29, 2024 | Robot Task PlanningScheduling | CodeCode Available | 2 |
| Efficient LLM Scheduling by Learning to Rank | Aug 28, 2024 | BlockingChatbot | CodeCode Available | 2 |
| MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning | Aug 20, 2024 | DenoisingImage Generation | CodeCode Available | 2 |
| SustainDC: Benchmarking for Sustainable Data Center Control | Aug 14, 2024 | BenchmarkingManagement | CodeCode Available | 2 |
| RodinHD: High-Fidelity 3D Avatar Generation with Diffusion Models | Jul 9, 2024 | DecoderScheduling | CodeCode Available | 2 |
| Teola: Towards End-to-End Optimization of LLM-based Applications | Jun 29, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Chat AI: A Seamless Slurm-Native Solution for HPC-Based Services | Jun 27, 2024 | Scheduling | CodeCode Available | 2 |
| EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting | Jun 22, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing | Jun 10, 2024 | SchedulingVideo Editing | CodeCode Available | 2 |
| Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow | Jun 3, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| Self-Consistent Recursive Diffusion Bridge for Medical Image Translation | May 10, 2024 | DenoisingScheduling | CodeCode Available | 2 |