| DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models | Feb 29, 2024 | GPU | CodeCode Available | 4 |
| KernelBench: Can LLMs Write Efficient GPU Kernels? | Feb 14, 2025 | GPU | CodeCode Available | 4 |
| 70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float | Apr 15, 2025 | CPUGPU | CodeCode Available | 4 |
| GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles | Oct 27, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 4 |
| DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality | Oct 25, 2022 | Deep Reinforcement LearningGPU | CodeCode Available | 4 |
| Real-time volumetric rendering of dynamic humans | Mar 21, 2023 | 3D ReconstructionGPU | CodeCode Available | 4 |
| Repurposing Diffusion-Based Image Generators for Monocular Depth Estimation | Dec 4, 2023 | Depth EstimationGPU | CodeCode Available | 4 |
| RTMDet: An Empirical Study of Designing Real-Time Object Detectors | Dec 14, 2022 | GPUInstance Segmentation | CodeCode Available | 4 |
| SocialED: A Python Library for Social Event Detection | Dec 18, 2024 | CPUEvent Detection | CodeCode Available | 4 |
| TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization | Dec 30, 2024 | Audio GenerationGPU | CodeCode Available | 4 |
| fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial Intelligence | Jul 1, 2024 | GPUPoint cloud reconstruction | CodeCode Available | 4 |
| 4D Gaussian Splatting for Real-Time Dynamic Scene Rendering | Oct 12, 2023 | Dynamic ReconstructionGPU | CodeCode Available | 4 |
| Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees | Sep 18, 2023 | GPUImputation | CodeCode Available | 4 |
| Accelerating Visual-Policy Learning through Parallel Differentiable Simulation | May 15, 2025 | GPU | CodeCode Available | 4 |
| PLAID: An Efficient Engine for Late Interaction Retrieval | May 19, 2022 | CPUGPU | CodeCode Available | 4 |
| Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation | Jun 4, 2024 | Face SwappingGPU | CodeCode Available | 4 |
| PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis | Sep 30, 2023 | GPU | CodeCode Available | 4 |
| CoTracker: It is Better to Track Together | Jul 14, 2023 | GPUmotion prediction | CodeCode Available | 4 |
| PointMamba: A Simple State Space Model for Point Cloud Analysis | Feb 16, 2024 | GPUMamba | CodeCode Available | 4 |
| Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints | Apr 15, 2025 | GPUInference Optimization | CodeCode Available | 4 |
| OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit | May 12, 2025 | GPUPrivacy Preserving | CodeCode Available | 4 |
| FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training | Mar 3, 2023 | Federated LearningGPU | CodeCode Available | 4 |
| FFCV: Accelerating Training by Removing Data Bottlenecks | Jun 21, 2023 | CPUGPU | CodeCode Available | 4 |
| NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model Internals | Jul 18, 2024 | Experimental DesignGPU | CodeCode Available | 4 |
| Otter: A Multi-Modal Model with In-Context Instruction Tuning | May 5, 2023 | GPUIn-Context Learning | CodeCode Available | 4 |
| Billion-scale similarity search with GPUs | Feb 28, 2017 | GPUImage Similarity Search | CodeCode Available | 4 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 |
| Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion | Jan 27, 2023 | GPUImage Generation | CodeCode Available | 4 |
| AudioLDM: Text-to-Audio Generation with Latent Diffusion Models | Jan 29, 2023 | AudioCapsAudio Generation | CodeCode Available | 4 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 |
| EvoX: A Distributed GPU-accelerated Framework for Scalable Evolutionary Computation | Jan 29, 2023 | GPUNavigate | CodeCode Available | 4 |
| fastai: A Layered API for Deep Learning | Feb 11, 2020 | Deep LearningGPU | CodeCode Available | 4 |
| Multi-head Temporal Latent Attention | May 19, 2025 | GPUspeech-recognition | CodeCode Available | 4 |
| PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map Consistency | Jan 17, 2024 | GPUIncremental Learning | CodeCode Available | 4 |
| QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | May 7, 2024 | GPULanguage Modelling | CodeCode Available | 4 |
| Theseus: A Library for Differentiable Nonlinear Optimization | Jul 19, 2022 | GPU | CodeCode Available | 4 |
| MegaBlocks: Efficient Sparse Training with Mixture-of-Experts | Nov 29, 2022 | GPUMixture-of-Experts | CodeCode Available | 3 |
| Merlin: A Vision Language Foundation Model for 3D Computed Tomography | Jun 10, 2024 | 3D Semantic SegmentationComputed Tomography (CT) | CodeCode Available | 3 |
| EscherNet: A Generative Model for Scalable View Synthesis | Feb 6, 2024 | 3D ReconstructionGPU | CodeCode Available | 3 |
| MetaDE: Evolving Differential Evolution by Differential Evolution | Feb 13, 2025 | Computational EfficiencyGPU | CodeCode Available | 3 |
| 3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt | Sep 19, 2024 | 3DGSGPU | CodeCode Available | 3 |
| ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters | May 4, 2022 | GPUImitation Learning | CodeCode Available | 3 |
| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 |
| MagicPIG: LSH Sampling for Efficient LLM Generation | Oct 21, 2024 | CPUGPU | CodeCode Available | 3 |
| MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Apr 8, 2024 | GPUMultiple-choice | CodeCode Available | 3 |
| Arctic Long Sequence Training: Scalable And Efficient Training For Multi-Million Token Sequences | Jun 16, 2025 | Document SummarizationGPU | CodeCode Available | 3 |
| EfficientQAT: Efficient Quantization-Aware Training for Large Language Models | Jul 10, 2024 | GPUQuantization | CodeCode Available | 3 |
| Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuray | Feb 7, 2025 | 4kGeneral Knowledge | CodeCode Available | 3 |
| M+: Extending MemoryLLM with Scalable Long-Term Memory | Feb 1, 2025 | 16kGPU | CodeCode Available | 3 |
| LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale | Aug 10, 2024 | GPULanguage Modelling | CodeCode Available | 3 |