| VoxFormer: Sparse Voxel Transformer for Camera-based 3D Semantic Scene Completion | Feb 23, 2023 | 3D geometry3D Semantic Scene Completion | CodeCode Available | 3 |
| Cramming: Training a Language Model on a Single GPU in One Day | Dec 28, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| MegaBlocks: Efficient Sparse Training with Mixture-of-Experts | Nov 29, 2022 | GPUMixture-of-Experts | CodeCode Available | 3 |
| What Language Model to Train if You Have One Million GPU Hours? | Oct 27, 2022 | GPULanguage Modeling | CodeCode Available | 3 |
| A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation Models | Oct 17, 2022 | CPUGPU | CodeCode Available | 3 |
| PyTorch Image Quality: Metrics for Image Quality Assessment | Aug 31, 2022 | GPUImage Quality Assessment | CodeCode Available | 3 |
| USB: A Unified Semi-supervised Learning Benchmark for Classification | Aug 12, 2022 | General ClassificationGPU | CodeCode Available | 3 |
| ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech | Jul 13, 2022 | DenoisingGPU | CodeCode Available | 3 |
| ASE: Large-Scale Reusable Adversarial Skill Embeddings for Physically Simulated Characters | May 4, 2022 | GPUImitation Learning | CodeCode Available | 3 |
| Fast Sampling of Diffusion Models with Exponential Integrator | Apr 29, 2022 | GPU | CodeCode Available | 3 |
| Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates | Sep 27, 2021 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | CodeCode Available | 3 |
| Robust High-Resolution Video Matting with Temporal Guidance | Aug 25, 2021 | 4kGPU | CodeCode Available | 3 |
| Real-Time High-Resolution Background Matting | Dec 14, 2020 | 4kGPU | CodeCode Available | 3 |
| Biomedical and Clinical English Model Packages in the Stanza Python NLP Library | Jul 29, 2020 | GPUNamed Entity Recognition | CodeCode Available | 3 |
| Generalized Focal Loss: Learning Qualified and Distributed Bounding Boxes for Dense Object Detection | Jun 8, 2020 | Dense Object DetectionGeneral Classification | CodeCode Available | 3 |
| U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection | May 18, 2020 | Dichotomous Image SegmentationGPU | CodeCode Available | 3 |
| Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence | Feb 12, 2020 | BIG-bench Machine LearningGPU | CodeCode Available | 3 |
| mlpack 3: a fast, flexible machine learning library | Jun 18, 2018 | BenchmarkingBIG-bench Machine Learning | CodeCode Available | 3 |
| Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded Modes | Aug 29, 2017 | BIG-bench Machine LearningCPU | CodeCode Available | 3 |
| U-Net: Convolutional Networks for Biomedical Image Segmentation | May 18, 2015 | Cell SegmentationCell Tracking | CodeCode Available | 3 |
| AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMs | Jul 8, 2025 | GPUreinforcement-learning | CodeCode Available | 2 |
| any4: Learned 4-bit Numeric Representation for LLMs | Jul 7, 2025 | GPUGSM8K | CodeCode Available | 2 |
| MathOptAI.jl: Embed trained machine learning predictors into JuMP models | Jul 3, 2025 | CPUGaussian Processes | CodeCode Available | 2 |
| MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation | Jun 29, 2025 | GPUOptical Flow Estimation | CodeCode Available | 2 |
| VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions | Jun 29, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| MegaFold: System-Level Optimizations for Accelerating Protein Structure Prediction Models | Jun 24, 2025 | GPUProtein Folding | CodeCode Available | 2 |
| PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning | Jun 24, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 2 |
| Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning | Jun 23, 2025 | GPULarge Language Model | CodeCode Available | 2 |
| SeerAttention-R: Sparse Attention Adaptation for Long Reasoning | Jun 10, 2025 | 4kGPU | CodeCode Available | 2 |
| Perceive Anything: Recognize, Explain, Caption, and Segment Anything in Images and Videos | Jun 5, 2025 | GPUSemantic Segmentation | CodeCode Available | 2 |
| ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS | May 29, 2025 | 3DGSGPU | CodeCode Available | 2 |
| QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design | May 22, 2025 | CPUGPU | CodeCode Available | 2 |
| Training Long-Context LLMs Efficiently via Chunk-wise Optimization | May 22, 2025 | 16kGPU | CodeCode Available | 2 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 |
| UltraEdit: Training-, Subject-, and Memory-Free Lifelong Editing in Large Language Models | May 20, 2025 | GPULifelong learning | CodeCode Available | 2 |
| VRSplat: Fast and Robust Gaussian Splatting for Virtual Reality | May 15, 2025 | 3DGSGPU | CodeCode Available | 2 |
| GPU Performance Portability needs Autotuning | Apr 30, 2025 | GPU | CodeCode Available | 2 |
| STCOcc: Sparse Spatial-Temporal Cascade Renovation for 3D Occupancy and Scene Flow Prediction | Apr 28, 2025 | GPU | CodeCode Available | 2 |
| CaRL: Learning Scalable Planning Policies with Simple Rewards | Apr 24, 2025 | Autonomous DrivingCARLA longest6 | CodeCode Available | 2 |
| SG-Reg: Generalizable and Efficient Scene Graph Registration | Apr 20, 2025 | GPU | CodeCode Available | 2 |
| Taccel: Scaling Up Vision-based Tactile Robotics via High-performance GPU Simulation | Apr 17, 2025 | GPUObject Recognition | CodeCode Available | 2 |
| Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images | Apr 13, 2025 | GPU | CodeCode Available | 2 |
| TorchFX: A modern approach to Audio DSP with PyTorch and GPU acceleration | Apr 11, 2025 | Audio Signal ProcessingBenchmarking | CodeCode Available | 2 |
| HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE Inference | Apr 8, 2025 | CPUGPU | CodeCode Available | 2 |
| Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors | Apr 7, 2025 | GPU | CodeCode Available | 2 |
| GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration | Apr 3, 2025 | GPUQuantization | CodeCode Available | 2 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| THEMIS: Towards Practical Intellectual Property Protection for Post-Deployment On-Device Deep Learning Models | Mar 31, 2025 | GPU | CodeCode Available | 2 |
| FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning | Mar 30, 2025 | 2kGPU | CodeCode Available | 2 |
| CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models | Mar 28, 2025 | GPUGSM8K | CodeCode Available | 2 |