| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | Jan 9, 2024 | GPU | CodeCode Available | 3 |
| RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation | Jan 9, 2024 | GPUMath | CodeCode Available | 3 |
| MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | Dec 28, 2023 | AutoMLCPU | CodeCode Available | 3 |
| XuanCe: A Comprehensive and Unified Deep Reinforcement Learning Library | Dec 25, 2023 | CPUDeep Reinforcement Learning | CodeCode Available | 3 |
| Splatter Image: Ultra-Fast Single-View 3D Reconstruction | Dec 20, 2023 | 3D Object Reconstruction3D Reconstruction | CodeCode Available | 3 |
| S-LoRA: Serving Thousands of Concurrent LoRA Adapters | Nov 6, 2023 | GPUparameter-efficient fine-tuning | CodeCode Available | 3 |
| Punica: Multi-Tenant LoRA Serving | Oct 28, 2023 | GPU | CodeCode Available | 3 |
| TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs | Oct 25, 2023 | Autonomous DrivingGPU | CodeCode Available | 3 |
| Take the aTrain. Introducing an Interface for the Accessible Transcription of Interviews | Oct 18, 2023 | CPUGPU | CodeCode Available | 3 |