| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| Dataset Distillation with Neural Characteristic Function: A Minmax Perspective | Jan 1, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 3 |
| BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | Feb 6, 2024 | BinarizationGPU | CodeCode Available | 3 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 |
| Biomedical and Clinical English Model Packages in the Stanza Python NLP Library | Jul 29, 2020 | GPUNamed Entity Recognition | CodeCode Available | 3 |
| FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale | Jul 16, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Cramming: Training a Language Model on a Single GPU in One Day | Dec 28, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| Allo: A Programming Model for Composable Accelerator Design | Apr 7, 2024 | GPUHigh-Level Synthesis | CodeCode Available | 3 |
| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning | Feb 26, 2024 | GPUMinecraft | CodeCode Available | 3 |