| FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | Oct 16, 2024 | Audio GenerationGPU | CodeCode Available | 5 | 5 |
| FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion | Jun 11, 2024 | GPU | CodeCode Available | 5 | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 | 5 |
| FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning | Feb 29, 2024 | GPULanguage Modeling | CodeCode Available | 5 | 5 |
| LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language Models | Nov 8, 2023 | 8kGPU | CodeCode Available | 5 | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 | 5 |
| YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications | Sep 7, 2022 | GPUObject Detection | CodeCode Available | 5 | 5 |
| XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models | Nov 22, 2024 | GPU | CodeCode Available | 5 | 5 |
| YOLOv6 v3.0: A Full-Scale Reloading | Jan 13, 2023 | GPUObject Detection | CodeCode Available | 5 | 5 |
| LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale | Aug 15, 2022 | GPULanguage Modelling | CodeCode Available | 5 | 5 |
| Faster Segment Anything: Towards Lightweight SAM for Mobile Applications | Jun 25, 2023 | CPUDecoder | CodeCode Available | 5 | 5 |
| TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second | Jul 5, 2022 | AutoMLBayesian Inference | CodeCode Available | 5 | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 | 5 |
| Group-in-Group Policy Optimization for LLM Agent Training | May 16, 2025 | GPUMathematical Reasoning | CodeCode Available | 5 | 5 |
| ReLoRA: High-Rank Training Through Low-Rank Updates | Jul 11, 2023 | GPU | CodeCode Available | 5 | 5 |
| Fast On-device LLM Inference with NPUs | Jul 8, 2024 | CPUGPU | CodeCode Available | 5 | 5 |
| KBLaM: Knowledge Base augmented Language Model | Oct 14, 2024 | 8kGPU | CodeCode Available | 5 | 5 |
| EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design | Feb 1, 2023 | GPUobject-detection | CodeCode Available | 5 | 5 |
| Representing Long Volumetric Video with Temporal Gaussian Hierarchy | Dec 12, 2024 | GPU | CodeCode Available | 5 | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 | 5 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 | 5 |
| FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | Mar 13, 2023 | CPUGPU | CodeCode Available | 5 | 5 |
| PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | Sep 30, 2023 | GPU | CodeCode Available | 4 | 5 |
| GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles | Oct 27, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 4 | 5 |
| Deep Patch Visual SLAM | Aug 3, 2024 | GPUVisual Odometry | CodeCode Available | 4 | 5 |
| PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Jan 17, 2024 | GPUIncremental Learning | CodeCode Available | 4 | 5 |
| Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees | Sep 18, 2023 | GPUImputation | CodeCode Available | 4 | 5 |
| GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS | Aug 2, 2024 | GPUNavigate | CodeCode Available | 4 | 5 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 | 5 |
| On Scaling Up 3D Gaussian Splatting Training | Jun 26, 2024 | 3DGS3D Reconstruction | CodeCode Available | 4 | 5 |
| Otter: A Multi-Modal Model with In-Context Instruction Tuning | May 5, 2023 | GPUIn-Context Learning | CodeCode Available | 4 | 5 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 | 5 |
| Multi-head Temporal Latent Attention | May 19, 2025 | GPUspeech-recognition | CodeCode Available | 4 | 5 |
| Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion | Jan 27, 2023 | GPUImage Generation | CodeCode Available | 4 | 5 |
| CoTracker: It is Better to Track Together | Jul 14, 2023 | GPUmotion prediction | CodeCode Available | 4 | 5 |
| fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Jul 1, 2024 | GPUPoint cloud reconstruction | CodeCode Available | 4 | 5 |
| NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals | Jul 18, 2024 | Experimental DesignGPU | CodeCode Available | 4 | 5 |
| Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation | Jun 4, 2024 | Face SwappingGPU | CodeCode Available | 4 | 5 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 | 5 |
| Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training | Oct 28, 2021 | Deep LearningGPU | CodeCode Available | 4 | 5 |
| OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit | May 12, 2025 | GPUPrivacy Preserving | CodeCode Available | 4 | 5 |
| Mamba YOLO: A Simple Baseline for Object Detection with State Space Model | Jun 9, 2024 | GPUMamba | CodeCode Available | 4 | 5 |
| FFCV: Accelerating Training by Removing Data Bottlenecks | Jun 21, 2023 | CPUGPU | CodeCode Available | 4 | 5 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 | 5 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 | 5 |
| Mamba-FETrack: Frame-Event Tracking via State Space Model | Apr 28, 2024 | GPUMamba | CodeCode Available | 4 | 5 |
| Looking Backward: Streaming Video-to-Video Translation with Feature Banks | May 24, 2024 | GPUTranslation | CodeCode Available | 4 | 5 |
| EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation | Jan 29, 2023 | GPUNavigate | CodeCode Available | 4 | 5 |
| 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Apr 15, 2025 | CPUGPU | CodeCode Available | 4 | 5 |
| fastai: A Layered API for Deep Learning | Feb 11, 2020 | Deep LearningGPU | CodeCode Available | 4 | 5 |