| SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions | Mar 25, 2024 | DecoderGPU | CodeCode Available | 4 |
| SemanticDraw: Towards Real-Time Interactive Content Creation from Image Diffusion Models | Mar 14, 2024 | BlockingGPU | CodeCode Available | 4 |
| SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting | Mar 8, 2024 | GPU | CodeCode Available | 4 |
| DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models | Feb 29, 2024 | GPU | CodeCode Available | 4 |
| PointMamba: A Simple State Space Model for Point Cloud Analysis | Feb 16, 2024 | GPUMamba | CodeCode Available | 4 |
| JAX-Fluids 2.0: Towards HPC for Differentiable CFD of Compressible Two-phase Flows | Feb 7, 2024 | GPU | CodeCode Available | 4 |
| ServerlessLLM: Low-Latency Serverless Inference for Large Language Models | Jan 25, 2024 | GPUScheduling | CodeCode Available | 4 |
| PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Jan 17, 2024 | GPUIncremental Learning | CodeCode Available | 4 |
| Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation | Dec 4, 2023 | Depth EstimationGPU | CodeCode Available | 4 |
| LCM-LoRA: A Universal Stable-Diffusion Acceleration Module | Nov 9, 2023 | GPUImage Generation | CodeCode Available | 4 |
| 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering | Oct 12, 2023 | Dynamic ReconstructionGPU | CodeCode Available | 4 |
| Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference | Oct 6, 2023 | GPUImage Generation | CodeCode Available | 4 |
| PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | Sep 30, 2023 | GPU | CodeCode Available | 4 |
| Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees | Sep 18, 2023 | GPUImputation | CodeCode Available | 4 |
| Guaranteed Approximation Bounds for Mixed-Precision Neural Operators | Jul 27, 2023 | GPUOperator learning | CodeCode Available | 4 |
| CoTracker: It is Better to Track Together | Jul 14, 2023 | GPUmotion prediction | CodeCode Available | 4 |
| FFCV: Accelerating Training by Removing Data Bottlenecks | Jun 21, 2023 | CPUGPU | CodeCode Available | 4 |
| Otter: A Multi-Modal Model with In-Context Instruction Tuning | May 5, 2023 | GPUIn-Context Learning | CodeCode Available | 4 |
| Real-time volumetric rendering of dynamic humans | Mar 21, 2023 | 3D ReconstructionGPU | CodeCode Available | 4 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 |
| ArchiSound: Audio Generation with Diffusion | Jan 30, 2023 | Audio GenerationGPU | CodeCode Available | 4 |
| AudioLDM: Text-to-Audio Generation with Latent Diffusion Models | Jan 29, 2023 | AudioCapsAudio Generation | CodeCode Available | 4 |
| EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation | Jan 29, 2023 | GPUNavigate | CodeCode Available | 4 |
| Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion | Jan 27, 2023 | GPUImage Generation | CodeCode Available | 4 |
| RTMDet: An Empirical Study of Designing Real-Time Object Detectors | Dec 14, 2022 | GPUInstance Segmentation | CodeCode Available | 4 |
| DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality | Oct 25, 2022 | Deep Reinforcement LearningGPU | CodeCode Available | 4 |
| Theseus: A Library for Differentiable Nonlinear Optimization | Jul 19, 2022 | GPU | CodeCode Available | 4 |
| DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | Jun 30, 2022 | CPUGPU | CodeCode Available | 4 |
| EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction | May 29, 2022 | Autonomous DrivingCPU | CodeCode Available | 4 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 |
| High-Resolution Image Synthesis with Latent Diffusion Models | Dec 20, 2021 | DenoisingGPU | CodeCode Available | 4 |
| Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training | Oct 28, 2021 | Deep LearningGPU | CodeCode Available | 4 |
| GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles | Oct 27, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 4 |
| fastai: A Layered API for Deep Learning | Feb 11, 2020 | Deep LearningGPU | CodeCode Available | 4 |
| XGBoost: Scalable GPU Accelerated Learning | Jun 29, 2018 | Cloud ComputingData Compression | CodeCode Available | 4 |
| Billion-scale similarity search with GPUs | Feb 28, 2017 | GPUImage Similarity Search | CodeCode Available | 4 |
| FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale | Jul 16, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Jul 16, 2025 | GPU | CodeCode Available | 3 |
| Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models | Jun 23, 2025 | Domain AdaptationGPU | CodeCode Available | 3 |
| ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | Jun 22, 2025 | GPUImage Generation | CodeCode Available | 3 |
| Vine Copulas as Differentiable Computational Graphs | Jun 16, 2025 | GPUScheduling | CodeCode Available | 3 |
| Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences | Jun 16, 2025 | Document SummarizationGPU | CodeCode Available | 3 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning | May 24, 2025 | GPUReinforcement Learning (RL) | CodeCode Available | 3 |
| Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward | May 18, 2025 | GPUGraph Matching | CodeCode Available | 3 |
| FastMap: Revisiting Dense and Scalable Structure from Motion | May 7, 2025 | GPU | CodeCode Available | 3 |
| TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting Topologies | Apr 11, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications | Apr 11, 2025 | GPU | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| WeatherMesh-3: Fast and accurate operational global weather forecasting | Mar 28, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |