| DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality | Oct 25, 2022 | Deep Reinforcement LearningGPU | CodeCode Available | 4 |
| Theseus: A Library for Differentiable Nonlinear Optimization | Jul 19, 2022 | GPU | CodeCode Available | 4 |
| DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | Jun 30, 2022 | CPUGPU | CodeCode Available | 4 |
| EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction | May 29, 2022 | Autonomous DrivingCPU | CodeCode Available | 4 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 |
| High-Resolution Image Synthesis with Latent Diffusion Models | Dec 20, 2021 | DenoisingGPU | CodeCode Available | 4 |
| Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training | Oct 28, 2021 | Deep LearningGPU | CodeCode Available | 4 |
| GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles | Oct 27, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 4 |
| fastai: A Layered API for Deep Learning | Feb 11, 2020 | Deep LearningGPU | CodeCode Available | 4 |
| XGBoost: Scalable GPU Accelerated Learning | Jun 29, 2018 | Cloud ComputingData Compression | CodeCode Available | 4 |
| Billion-scale similarity search with GPUs | Feb 28, 2017 | GPUImage Similarity Search | CodeCode Available | 4 |
| FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale | Jul 16, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| Arctic Inference with Shift Parallelism: Fast and Efficient Open Source Inference System for Enterprise AI | Jul 16, 2025 | GPU | CodeCode Available | 3 |
| Efficient and Generalizable Speaker Diarization via Structured Pruning of Self-Supervised Models | Jun 23, 2025 | Domain AdaptationGPU | CodeCode Available | 3 |
| ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation | Jun 22, 2025 | GPUImage Generation | CodeCode Available | 3 |
| Vine Copulas as Differentiable Computational Graphs | Jun 16, 2025 | GPUScheduling | CodeCode Available | 3 |
| Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences | Jun 16, 2025 | Document SummarizationGPU | CodeCode Available | 3 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning | May 24, 2025 | GPUReinforcement Learning (RL) | CodeCode Available | 3 |
| Graph-Reward-SQL: Execution-Free Reinforcement Learning for Text-to-SQL via Graph Matching and Stepwise Reward | May 18, 2025 | GPUGraph Matching | CodeCode Available | 3 |
| FastMap: Revisiting Dense and Scalable Structure from Motion | May 7, 2025 | GPU | CodeCode Available | 3 |
| TensorNEAT: A GPU-accelerated Library for NeuroEvolution of Augmenting Topologies | Apr 11, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| MSCCL++: Rethinking GPU Communication Abstractions for Cutting-edge AI Applications | Apr 11, 2025 | GPU | CodeCode Available | 3 |
| GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-III | Apr 8, 2025 | Computational EfficiencyCPU | CodeCode Available | 3 |
| WeatherMesh-3: Fast and accurate operational global weather forecasting | Mar 28, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |