| ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference | Oct 28, 2024 | CPU | CodeCode Available | 3 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving | Jul 22, 2024 | CPUGPU | CodeCode Available | 3 |
| Inference Performance Optimization for Large Language Models on CPUs | Jul 10, 2024 | CPUGPU | CodeCode Available | 3 |
| NGD-SLAM: Towards Real-Time Dynamic SLAM without GPU | May 12, 2024 | CPUDeep Learning | CodeCode Available | 3 |
| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | Feb 10, 2024 | CPUGPU | CodeCode Available | 3 |
| MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | Dec 28, 2023 | AutoMLCPU | CodeCode Available | 3 |
| XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library | Dec 25, 2023 | CPUDeep Reinforcement Learning | CodeCode Available | 3 |
| Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews | Oct 18, 2023 | CPUGPU | CodeCode Available | 3 |
| Unlimiformer: Long-Range Transformers with Unlimited Length Input | May 2, 2023 | Book summarizationCPU | CodeCode Available | 3 |