| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| MegaBlocks: Efficient Sparse Training with Mixture-of-Experts | Nov 29, 2022 | GPUMixture-of-Experts | CodeCode Available | 3 |
| mlpack 3: a fast, flexible machine learning library | Jun 18, 2018 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 3 |
| LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training | Mar 3, 2025 | 3DGSGPU | CodeCode Available | 3 |
| LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Aug 10, 2024 | GPULanguage Modelling | CodeCode Available | 3 |
| LinFusion: 1 GPU, 1 Minute, 16K Image | Sep 3, 2024 | 16kCausal Inference | CodeCode Available | 3 |
| LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture | Sep 4, 2024 | GPUMamba | CodeCode Available | 3 |
| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 |
| Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services | Apr 25, 2024 | GPU | CodeCode Available | 3 |
| BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | Feb 6, 2024 | BinarizationGPU | CodeCode Available | 3 |
| Data Generation for Hardware-Friendly Post-Training Quantization | Oct 29, 2024 | Data AugmentationGPU | CodeCode Available | 3 |
| Dataset Distillation with Neural Characteristic Function: A Minmax Perspective | Jan 1, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 3 |
| Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | Jan 9, 2024 | GPU | CodeCode Available | 3 |
| InstanSeg: an embedding-based instance segmentation algorithm optimized for accurate, efficient and portable cell segmentation | Aug 28, 2024 | Cell SegmentationGPU | CodeCode Available | 3 |
| Allo: A Programming Model for Composable Accelerator Design | Apr 7, 2024 | GPUHigh-Level Synthesis | CodeCode Available | 3 |
| FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-Design | Jan 25, 2024 | GPUQuantization | CodeCode Available | 3 |
| Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning | Feb 26, 2024 | GPUMinecraft | CodeCode Available | 3 |
| Cramming: Training a Language Model on a Single GPU in One Day | Dec 28, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| Retentive Network: A Successor to Transformer for Large Language Models | Jul 17, 2023 | GPULanguage Modeling | CodeCode Available | 3 |
| Consistency Models Made Easy | Jun 20, 2024 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models | Jan 16, 2024 | GPUQuantization | CodeCode Available | 3 |
| CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation | Oct 12, 2024 | Conditional Image GenerationGPU | CodeCode Available | 3 |
| Inference Performance Optimization for Large Language Models on CPUs | Jul 10, 2024 | CPUGPU | CodeCode Available | 3 |