| WebLLM: A High-Performance In-Browser LLM Inference Engine | Dec 20, 2024 | CPUGPU | CodeCode Available | 11 |
| Magika: AI-Powered Content-Type Detection | Sep 18, 2024 | CPUMalware Analysis | CodeCode Available | 11 |
| PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction | Mar 21, 2025 | CPUDocument Layout Analysis | CodeCode Available | 9 |
| Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models | Dec 23, 2024 | CPU | CodeCode Available | 9 |
| PowerInfer-2: Fast Large Language Model Inference on a Smartphone | Jun 10, 2024 | CPULanguage Modeling | CodeCode Available | 9 |
| Chinese-Vicuna: A Chinese Instruction-following Llama-based Model | Apr 17, 2025 | Code GenerationCPU | CodeCode Available | 7 |
| Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization | Mar 26, 2025 | CPUGPU | CodeCode Available | 7 |
| Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | Jun 24, 2024 | CPUGPU | CodeCode Available | 7 |
| Full Scaling Automation for Sustainable Development of Green Data Centers | May 1, 2023 | Cloud ComputingCPU | CodeCode Available | 7 |
| Elixir: Train a Large Language Model on a Small GPU Cluster | Dec 10, 2022 | CPUGPU | CodeCode Available | 7 |
| Fast On-device LLM Inference with NPUs | Jul 8, 2024 | CPUGPU | CodeCode Available | 5 |
| XFeat: Accelerated Features for Lightweight Image Matching | Apr 30, 2024 | CPUKeypoint detection and image matching | CodeCode Available | 5 |
| Extreme Compression of Large Language Models via Additive Quantization | Jan 11, 2024 | CPUGPU | CodeCode Available | 5 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 |
| Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Jun 25, 2023 | CPUDecoder | CodeCode Available | 5 |
| FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | Mar 13, 2023 | CPUGPU | CodeCode Available | 5 |
| Vectorized and performance-portable Quicksort | May 12, 2022 | CPU | CodeCode Available | 5 |
| 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Apr 15, 2025 | CPUGPU | CodeCode Available | 4 |
| SocialED: A Python Library for Social Event Detection | Dec 18, 2024 | CPUEvent Detection | CodeCode Available | 4 |
| InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN Problems | Oct 21, 2024 | Automated Theorem ProvingCPU | CodeCode Available | 4 |
| Data-Prep-Kit: getting your data ready for LLM application development | Sep 26, 2024 | CPULanguage Modeling | CodeCode Available | 4 |
| SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion Planning | Aug 14, 2024 | CPUMotion Planning | CodeCode Available | 4 |
| T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge | Jun 25, 2024 | Computational EfficiencyCPU | CodeCode Available | 4 |
| Look Once to Hear: Target Speech Hearing with Noisy Examples | May 10, 2024 | CPUSpeech Extraction | CodeCode Available | 4 |
| Vidur: A Large-Scale Simulation Framework For LLM Inference | May 8, 2024 | CPUGPU | CodeCode Available | 4 |
| Couler: Unified Machine Learning Workflow Optimization in Cloud | Mar 12, 2024 | CPU | CodeCode Available | 4 |
| Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time Series | Jan 8, 2024 | CPUFew-Shot Learning | CodeCode Available | 4 |
| FFCV: Accelerating Training by Removing Data Bottlenecks | Jun 21, 2023 | CPUGPU | CodeCode Available | 4 |
| DeepFilterNet: Perceptually Motivated Real-Time Speech Enhancement | May 14, 2023 | CPUSpeech Enhancement | CodeCode Available | 4 |
| DAMO-YOLO : A Report on Real-Time Object Detection Design | Nov 23, 2022 | CPUNeural Architecture Search | CodeCode Available | 4 |
| DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | Jun 30, 2022 | CPUGPU | CodeCode Available | 4 |
| EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction | May 29, 2022 | Autonomous DrivingCPU | CodeCode Available | 4 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 |
| DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band Audio | May 11, 2022 | CPUData Augmentation | CodeCode Available | 4 |
| The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models | Mar 14, 2022 | CPUQuantization | CodeCode Available | 4 |
| GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles | Oct 27, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 4 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 |
| Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing | Nov 22, 2024 | Computational EfficiencyCPU | CodeCode Available | 3 |
| ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Oct 28, 2024 | CPU | CodeCode Available | 3 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jul 22, 2024 | CPUGPU | CodeCode Available | 3 |
| Inference Performance Optimization for Large Language Models on CPUs | Jul 10, 2024 | CPUGPU | CodeCode Available | 3 |
| NGD-SLAM: Towards Real-Time Dynamic SLAM without GPU | May 12, 2024 | CPUDeep Learning | CodeCode Available | 3 |
| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | Feb 10, 2024 | CPUGPU | CodeCode Available | 3 |
| MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | Dec 28, 2023 | AutoMLCPU | CodeCode Available | 3 |
| XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library | Dec 25, 2023 | CPUDeep Reinforcement Learning | CodeCode Available | 3 |
| Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews | Oct 18, 2023 | CPUGPU | CodeCode Available | 3 |
| Unlimiformer: Long-Range Transformers with Unlimited Length Input | May 2, 2023 | Book summarizationCPU | CodeCode Available | 3 |