| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 | 5 |
| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 | 5 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 | 5 |
| A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models | Oct 17, 2022 | CPUGPU | CodeCode Available | 3 | 5 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 | 5 |
| BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models | Apr 3, 2024 | GPUMath | CodeCode Available | 3 | 5 |
| Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Jul 16, 2025 | GPU | CodeCode Available | 3 | 5 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 | 5 |
| LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture | Sep 4, 2024 | GPUMamba | CodeCode Available | 3 | 5 |
| Dataset Distillation with Neural Characteristic Function: A Minmax Perspective | Jan 1, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 3 | 5 |
| Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences | Jun 16, 2025 | Document SummarizationGPU | CodeCode Available | 3 | 5 |
| GraphNeuralNetworks.jl: Deep Learning on Graphs with Julia | Dec 9, 2024 | Deep LearningGPU | CodeCode Available | 3 | 5 |
| LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Aug 10, 2024 | GPULanguage Modelling | CodeCode Available | 3 | 5 |
| Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models | Jan 9, 2024 | GPU | CodeCode Available | 3 | 5 |
| CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation | Oct 12, 2024 | Conditional Image GenerationGPU | CodeCode Available | 3 | 5 |
| LinFusion: 1 GPU, 1 Minute, 16K Image | Sep 3, 2024 | 16kCausal Inference | CodeCode Available | 3 | 5 |
| HadaCore: Tensor Core Accelerated Hadamard Transform Kernel | Dec 12, 2024 | GPUMMLU | CodeCode Available | 3 | 5 |
| Cramming: Training a Language Model on a Single GPU in One Day | Dec 28, 2022 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters | May 4, 2022 | GPUImitation Learning | CodeCode Available | 3 | 5 |
| High-Speed Stereo Visual SLAM for Low-Powered Computing Devices | Oct 5, 2024 | GPU | CodeCode Available | 3 | 5 |
| Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning | Feb 26, 2024 | GPUMinecraft | CodeCode Available | 3 | 5 |
| KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | Jan 31, 2024 | GPUQuantization | CodeCode Available | 3 | 5 |
| How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks? | Jan 20, 2025 | Computed Tomography (CT)GPU | CodeCode Available | 3 | 5 |
| Transformers Can Do Arithmetic with the Right Embeddings | May 27, 2024 | GPUPosition | CodeCode Available | 3 | 5 |
| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 | 5 |