| Training Graph Neural Networks with 1000 Layers | Jun 14, 2021 | GPUGraph Sampling | CodeCode Available | 2 |
| Accelerating Sparse Deep Neural Networks | Apr 16, 2021 | GPUMath | CodeCode Available | 2 |
| FastMoE: A Fast Mixture-of-Expert Training System | Mar 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |
| When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute | Feb 24, 2021 | GPULanguage Modeling | CodeCode Available | 2 |
| Boundary-Aware Segmentation Network for Mobile and Web Applications | Jan 12, 2021 | Camouflaged Object SegmentationDecoder | CodeCode Available | 2 |
| RepVGG: Making VGG-style ConvNets Great Again | Jan 11, 2021 | GPUImage Classification | CodeCode Available | 2 |
| I-BERT: Integer-only BERT Quantization | Jan 5, 2021 | GPUNatural Language Inference | CodeCode Available | 2 |
| JAX MD: A Framework for Differentiable Physics | Dec 1, 2020 | GPU | CodeCode Available | 2 |
| MODNet: Real-Time Trimap-Free Portrait Matting via Objective Decomposition | Nov 24, 2020 | GPUImage Matting | CodeCode Available | 2 |
| RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms | Nov 3, 2020 | Collaborative FilteringGPU | CodeCode Available | 2 |
| LightSeq: A High Performance Inference Library for Transformers | Oct 23, 2020 | GPUMachine Translation | CodeCode Available | 2 |
| HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis | Oct 12, 2020 | CPUGPU | CodeCode Available | 2 |
| Partial FC: Training 10 Million Identities on a Single Machine | Oct 11, 2020 | Face IdentificationFace Recognition | CodeCode Available | 2 |
| A Tensor Compiler for Unified Machine Learning Prediction Serving | Oct 9, 2020 | BIG-bench Machine LearningCPU | CodeCode Available | 2 |
| Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping | Sep 7, 2020 | GPU | CodeCode Available | 2 |
| Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework | Jun 23, 2020 | BenchmarkingGPU | CodeCode Available | 2 |
| FastReID: A Pytorch Toolbox for General Instance Re-identification | Jun 4, 2020 | Face RecognitionGPU | CodeCode Available | 2 |
| Geomstats: A Python Package for Riemannian Geometry in Machine Learning | Apr 7, 2020 | BIG-bench Machine LearningClustering | CodeCode Available | 2 |
| Neural Network Compression Framework for fast model inference | Feb 20, 2020 | BinarizationCPU | CodeCode Available | 2 |
| Deep Snake for Real-Time Instance Segmentation | Jan 6, 2020 | GPUInstance Segmentation | CodeCode Available | 2 |
| JAX, M.D.: A Framework for Differentiable Physics | Dec 9, 2019 | Drug DiscoveryGPU | CodeCode Available | 2 |
| Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram | Oct 25, 2019 | Generative Adversarial NetworkGPU | CodeCode Available | 2 |
| ALBERT: A Lite BERT for Self-supervised Learning of Language Representations | Sep 26, 2019 | Common Sense ReasoningGPU | CodeCode Available | 2 |
| Positive-Unlabeled Compression on the Cloud | Sep 21, 2019 | GPUKnowledge Distillation | CodeCode Available | 2 |
| Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism | Sep 17, 2019 | GPULAMBADA | CodeCode Available | 2 |
| Asymmetric Non-local Neural Networks for Semantic Segmentation | Aug 21, 2019 | GPUSegmentation | CodeCode Available | 2 |
| Habitat: A Platform for Embodied AI Research | Apr 2, 2019 | BenchmarkingGPU | CodeCode Available | 2 |
| AutoFocus: Efficient Multi-Scale Inference | Dec 4, 2018 | GPU | CodeCode Available | 2 |
| ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware | Dec 2, 2018 | GPUImage Classification | CodeCode Available | 2 |
| SNIPER: Efficient Multi-Scale Training | May 23, 2018 | GPUimage-classification | CodeCode Available | 2 |
| geomstats: a Python Package for Riemannian Geometry in Machine Learning | May 21, 2018 | BIG-bench Machine LearningGPU | CodeCode Available | 2 |
| Efficient Neural Audio Synthesis | Feb 23, 2018 | Audio SynthesisCPU | CodeCode Available | 2 |
| AMC: AutoML for Model Compression and Acceleration on Mobile Devices | Feb 10, 2018 | AutoMLGPU | CodeCode Available | 2 |
| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Feature Pyramid Networks for Object Detection | Dec 9, 2016 | GPUObject | CodeCode Available | 2 |
| GPflow: A Gaussian process library using TensorFlow | Oct 27, 2016 | Gaussian ProcessesGPU | CodeCode Available | 2 |
| Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations | Sep 22, 2016 | GPU | CodeCode Available | 2 |
| Fast Algorithms for Convolutional Neural Networks | Sep 30, 2015 | GPUPedestrian Detection | CodeCode Available | 2 |
| Relative Entropy Pathwise Policy Optimization | Jul 15, 2025 | GPU | CodeCode Available | 1 |
| LLMThinkBench: Towards Basic Math Reasoning and Overthinking in Large Language Models | Jul 5, 2025 | BenchmarkingGPU | CodeCode Available | 1 |
| FADRM: Fast and Accurate Data Residual Matching for Dataset Distillation | Jun 30, 2025 | Computational EfficiencyDataset Distillation | CodeCode Available | 1 |
| Fast ground penetrating radar dual-parameter full waveform inversion method accelerated by hybrid compilation of CUDA kernel function and PyTorch | Jun 25, 2025 | Computational EfficiencyGPR | CodeCode Available | 1 |
| Exploiting Lightweight Hierarchical ViT and Dynamic Framework for Efficient Visual Tracking | Jun 25, 2025 | GPUVisual Tracking | CodeCode Available | 1 |
| DIP: Unsupervised Dense In-Context Post-training of Visual Representations | Jun 23, 2025 | GPUMeta-Learning | CodeCode Available | 1 |
| CommVQ: Commutative Vector Quantization for KV Cache Compression | Jun 23, 2025 | GPUGSM8K | CodeCode Available | 1 |
| ConsumerBench: Benchmarking Generative AI Applications on End-User Devices | Jun 21, 2025 | BenchmarkingCPU | CodeCode Available | 1 |
| Farseer: A Refined Scaling Law in Large Language Models | Jun 12, 2025 | GPU | CodeCode Available | 1 |
| Mutual-Supervised Learning for Sequential-to-Parallel Code Translation | Jun 11, 2025 | Code TranslationGPU | CodeCode Available | 1 |
| Diagonal Batching Unlocks Parallelism in Recurrent Memory Transformers for Long Contexts | Jun 5, 2025 | GPUScheduling | CodeCode Available | 1 |
| Accelerating AllReduce with a Persistent Straggler | May 29, 2025 | GPU | CodeCode Available | 1 |