| Retentive Network: A Successor to Transformer for Large Language Models | Jul 17, 2023 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | Feb 10, 2024 | CPUGPU | CodeCode Available | 3 | 5 |
| BiLLM: Pushing the Limit of Post-Training Quantization for LLMs | Feb 6, 2024 | BinarizationGPU | CodeCode Available | 3 | 5 |
| Fine-Tuning Language Models with Just Forward Passes | May 27, 2023 | GPUIn-Context Learning | CodeCode Available | 3 | 5 |
| Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming Services | Apr 25, 2024 | GPU | CodeCode Available | 3 | 5 |
| FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization | Mar 24, 2023 | 3D Hand Pose EstimationGPU | CodeCode Available | 3 | 5 |
| PanSplat: 4K Panorama Synthesis with Feed-Forward Gaussian Splatting | Dec 16, 2024 | 3D Reconstruction4k | CodeCode Available | 3 | 5 |
| Fast Sampling of Diffusion Models with Exponential Integrator | Apr 29, 2022 | GPU | CodeCode Available | 3 | 5 |
| Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes | Aug 29, 2017 | BIG-bench Machine LearningCPU | CodeCode Available | 3 | 5 |
| OctFusion: Octree-based Diffusion Models for 3D Shape Generation | Aug 27, 2024 | 3D Generation3D Shape Generation | CodeCode Available | 3 | 5 |
| NGD-SLAM: Towards Real-Time Dynamic SLAM without GPU | May 12, 2024 | CPUDeep Learning | CodeCode Available | 3 | 5 |
| Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates | Sep 27, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 | 5 |
| Allo: A Programming Model for Composable Accelerator Design | Apr 7, 2024 | GPUHigh-Level Synthesis | CodeCode Available | 3 | 5 |
| nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited Resources | Sep 5, 2023 | DecoderGPU | CodeCode Available | 3 | 5 |
| Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data Processing | Nov 22, 2024 | Computational EfficiencyCPU | CodeCode Available | 3 | 5 |
| ABQ-LLM: Arbitrary-Bit Quantized Inference Acceleration for Large Language Models | Aug 16, 2024 | GPUModel Compression | CodeCode Available | 3 | 5 |
| MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert Cache | Jan 25, 2024 | GPUmodel | CodeCode Available | 3 | 5 |
| MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | Dec 28, 2023 | AutoMLCPU | CodeCode Available | 3 | 5 |
| BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models | Apr 3, 2024 | GPUMath | CodeCode Available | 3 | 5 |
| Modular Duality in Deep Learning | Oct 28, 2024 | Deep LearningGPU | CodeCode Available | 3 | 5 |
| MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion | May 30, 2024 | DenoisingGPU | CodeCode Available | 3 | 5 |
| mlpack 3: a fast, flexible machine learning library | Jun 18, 2018 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 3 | 5 |
| M+: Extending MemoryLLM with Scalable Long-Term Memory | Feb 1, 2025 | 16kGPU | CodeCode Available | 3 | 5 |
| MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of Experts | Apr 22, 2024 | Common Sense ReasoningGPU | CodeCode Available | 3 | 5 |
| Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Jun 10, 2024 | 3D Semantic SegmentationComputed Tomography (CT) | CodeCode Available | 3 | 5 |
| Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models | Jun 23, 2025 | Domain AdaptationGPU | CodeCode Available | 3 | 5 |
| MetaDE: Evolving Differential Evolution by Differential Evolution | Feb 13, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 | 5 |
| MegaBlocks: Efficient Sparse Training with Mixture-of-Experts | Nov 29, 2022 | GPUMixture-of-Experts | CodeCode Available | 3 | 5 |
| MobileMamba: Lightweight Multi-Receptive Visual Mamba Network | Nov 24, 2024 | GPUMamba | CodeCode Available | 3 | 5 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 | 5 |
| 94% on CIFAR-10 in 3.29 Seconds on a Single GPU | Mar 30, 2024 | GPU | CodeCode Available | 3 | 5 |
| EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | Jul 10, 2024 | GPUQuantization | CodeCode Available | 3 | 5 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 | 5 |
| A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models | Oct 17, 2022 | CPUGPU | CodeCode Available | 3 | 5 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 | 5 |
| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 | 5 |
| LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Aug 10, 2024 | GPULanguage Modelling | CodeCode Available | 3 | 5 |
| LiteGS: A High-Performance Modular Framework for Gaussian Splatting Training | Mar 3, 2025 | 3DGSGPU | CodeCode Available | 3 | 5 |
| LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture | Sep 4, 2024 | GPUMamba | CodeCode Available | 3 | 5 |
| EscherNet: A Generative Model for Scalable View Synthesis | Feb 6, 2024 | 3D ReconstructionGPU | CodeCode Available | 3 | 5 |
| LinFusion: 1 GPU, 1 Minute, 16K Image | Sep 3, 2024 | 16kCausal Inference | CodeCode Available | 3 | 5 |
| MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications | Apr 11, 2025 | GPU | CodeCode Available | 3 | 5 |
| LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management | Oct 1, 2024 | GPULanguage Modeling | CodeCode Available | 3 | 5 |
| KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | Jan 31, 2024 | GPUQuantization | CodeCode Available | 3 | 5 |
| Dataset Distillation with Neural Characteristic Function: A Minmax Perspective | Jan 1, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 3 | 5 |
| ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters | May 4, 2022 | GPUImitation Learning | CodeCode Available | 3 | 5 |
| FastMap: Revisiting Dense and Scalable Structure from Motion | May 7, 2025 | GPU | CodeCode Available | 3 | 5 |
| Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences | Jun 16, 2025 | Document SummarizationGPU | CodeCode Available | 3 | 5 |
| Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Jul 16, 2025 | GPU | CodeCode Available | 3 | 5 |
| InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation | Sep 12, 2023 | GPUImage Generation | CodeCode Available | 3 | 5 |