| Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction | Dec 6, 2024 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 | 5 |
| Low-resource finetuning of foundation models beats state-of-the-art in histopathology | Jan 9, 2024 | GPUSelf-Supervised Learning | CodeCode Available | 2 | 5 |
| Low-Rank Quantization-Aware Training for LLMs | Jun 10, 2024 | GPUparameter-efficient fine-tuning | CodeCode Available | 2 | 5 |
| Forecasting GPU Performance for Deep Learning Training and Inference | Jul 18, 2024 | Deep LearningGPU | CodeCode Available | 2 | 5 |
| Accelerated Quality-Diversity through Massive Parallelism | Feb 2, 2022 | DiversityGPU | CodeCode Available | 2 | 5 |
| LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor Search | Oct 24, 2024 | ClusteringGPU | CodeCode Available | 2 | 5 |
| LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism | Apr 15, 2024 | GPU | CodeCode Available | 2 | 5 |
| LoQT: Low-Rank Adapters for Quantized Pretraining | May 26, 2024 | GPULanguage Modeling | CodeCode Available | 2 | 5 |
| cuSLINK: Single-linkage Agglomerative Clustering on the GPU | Jun 28, 2023 | ClusteringGPU | CodeCode Available | 2 | 5 |
| LongRecipe: Recipe for Efficient Long Context Generalization in Large Language Models | Aug 31, 2024 | 8kGPU | CodeCode Available | 2 | 5 |
| LoRA: Low-Rank Adaptation of Large Language Models | Jun 17, 2021 | GPULanguage Modelling | CodeCode Available | 2 | 5 |
| Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote Sensing | Jan 29, 2024 | GPURepresentation Learning | CodeCode Available | 2 | 5 |
| Cross-domain Neural Pitch and Periodicity Estimation | Jan 28, 2023 | CPUGPU | CodeCode Available | 2 | 5 |
| CrypTen: Secure Multi-Party Computation Meets Machine Learning | Sep 2, 2021 | BIG-bench Machine LearningGPU | CodeCode Available | 2 | 5 |
| LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models | Mar 4, 2022 | DecoderGPU | CodeCode Available | 2 | 5 |
| LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Mar 11, 2025 | GPUImage Generation | CodeCode Available | 2 | 5 |
| 360MonoDepth: High-Resolution 360deg Monocular Depth Estimation | Jan 1, 2022 | 2kDepth Estimation | CodeCode Available | 2 | 5 |
| LightSeq2: Accelerated Training for Transformer-based Models on GPUs | Oct 12, 2021 | DecoderGPU | CodeCode Available | 2 | 5 |
| Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs | Jun 13, 2024 | BenchmarkingGPU | CodeCode Available | 2 | 5 |
| A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS Library | Dec 19, 2023 | GPU | CodeCode Available | 2 | 5 |
| λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space | Feb 7, 2024 | Concept AlignmentGPU | CodeCode Available | 2 | 5 |
| AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs | Jul 8, 2025 | GPUreinforcement-learning | CodeCode Available | 2 | 5 |
| Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning | Sep 24, 2021 | Deep Reinforcement LearningGPU | CodeCode Available | 2 | 5 |
| LightSeq: A High Performance Inference Library for Transformers | Oct 23, 2020 | GPUMachine Translation | CodeCode Available | 2 | 5 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 | 5 |
| LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation | Oct 16, 2023 | GPUImage Animation | CodeCode Available | 2 | 5 |
| Latent Neural Operator for Solving Forward and Inverse PDE Problems | Jun 6, 2024 | Computational EfficiencyGPU | CodeCode Available | 2 | 5 |
| LeanDojo: Theorem Proving with Retrieval-Augmented Language Models | Jun 27, 2023 | Automated Theorem ProvingGPU | CodeCode Available | 2 | 5 |
| KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation | Feb 21, 2025 | Audio GenerationFAD | CodeCode Available | 2 | 5 |
| 2nd Place Solution for Waymo Open Dataset Challenge -- Real-time 2D Object Detection | Jun 16, 2021 | 2D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| Learning to Fly in Seconds | Nov 22, 2023 | GPUReinforcement Learning (RL) | CodeCode Available | 2 | 5 |
| DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs Training | Oct 5, 2023 | GPU | CodeCode Available | 2 | 5 |
| A User's Guide to KSig: GPU-Accelerated Computation of the Signature Kernel | Jan 13, 2025 | GPU | CodeCode Available | 2 | 5 |
| Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models | Jun 11, 2024 | DiversityGPU | CodeCode Available | 2 | 5 |
| Isaac Gym: High Performance GPU-Based Physics Simulation For Robot Learning | Aug 24, 2021 | CPUGPU | CodeCode Available | 2 | 5 |
| 2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D Object Detection | Jun 16, 2021 | 2D Object DetectionAutonomous Driving | CodeCode Available | 2 | 5 |
| MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition | Nov 24, 2020 | GPUImage Matting | CodeCode Available | 2 | 5 |
| Instant Volumetric Head Avatars | Nov 22, 2022 | Face ModelGPU | CodeCode Available | 2 | 5 |
| INT-FlashAttention: Enabling Flash Attention for INT8 Quantization | Sep 25, 2024 | GPUQuantization | CodeCode Available | 2 | 5 |
| InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information Retrieval | Jul 10, 2023 | GPUInformation Retrieval | CodeCode Available | 2 | 5 |
| Invertible Diffusion Models for Compressed Sensing | Mar 25, 2024 | compressed sensingGPU | CodeCode Available | 2 | 5 |
| JaxMARL: Multi-Agent RL Environments and Algorithms in JAX | Nov 16, 2023 | CPUGPU | CodeCode Available | 2 | 5 |
| AudioDec: An Open-source Streaming High-fidelity Neural Audio Codec | May 26, 2023 | CPUGPU | CodeCode Available | 2 | 5 |
| Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic Scenes | Oct 12, 2023 | GPUNovel View Synthesis | CodeCode Available | 2 | 5 |
| HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Apr 8, 2025 | CPUGPU | CodeCode Available | 2 | 5 |
| I-BERT: Integer-only BERT Quantization | Jan 5, 2021 | GPUNatural Language Inference | CodeCode Available | 2 | 5 |
| ImMesh: An Immediate LiDAR Localization and Meshing Framework | Jan 12, 2023 | CPUDimensionality Reduction | CodeCode Available | 2 | 5 |
| HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection | Feb 2, 2022 | Audio ClassificationEvent Detection | CodeCode Available | 2 | 5 |
| AutoFocus: Efficient Multi-Scale Inference | Dec 4, 2018 | GPU | CodeCode Available | 2 | 5 |
| HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic Segmentation | Apr 27, 2022 | Domain AdaptationGPU | CodeCode Available | 2 | 5 |