| DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | Jun 30, 2022 | CPUGPU | CodeCode Available | 4 |
| EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction | May 29, 2022 | Autonomous DrivingCPU | CodeCode Available | 4 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 |
| DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio | May 11, 2022 | CPUData Augmentation | CodeCode Available | 4 |
| The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models | Mar 14, 2022 | CPUQuantization | CodeCode Available | 4 |
| GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles | Oct 27, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 4 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 |
| Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing | Nov 22, 2024 | Computational EfficiencyCPU | CodeCode Available | 3 |