| YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications | Sep 7, 2022 | GPUObject Detection | CodeCode Available | 5 | 5 |
| FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning | Feb 29, 2024 | GPULanguage Modeling | CodeCode Available | 5 | 5 |
| Fast On-device LLM Inference with NPUs | Jul 8, 2024 | CPUGPU | CodeCode Available | 5 | 5 |
| ReLoRA: High-Rank Training Through Low-Rank Updates | Jul 11, 2023 | GPU | CodeCode Available | 5 | 5 |
| Extreme Compression of Large Language Models via Additive Quantization | Jan 11, 2024 | CPUGPU | CodeCode Available | 5 | 5 |
| Representing Long Volumetric Video with Temporal Gaussian Hierarchy | Dec 12, 2024 | GPU | CodeCode Available | 5 | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 | 5 |
| FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPU | Mar 13, 2023 | CPUGPU | CodeCode Available | 5 | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 | 5 |
| Point-E: A System for Generating 3D Point Clouds from Complex Prompts | Dec 16, 2022 | Generating 3D Point CloudsGPU | CodeCode Available | 5 | 5 |
| PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU | Dec 16, 2023 | CPUGPU | CodeCode Available | 5 | 5 |
| Group-in-Group Policy Optimization for LLM Agent Training | May 16, 2025 | GPUMathematical Reasoning | CodeCode Available | 5 | 5 |
| FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion | Jun 11, 2024 | GPU | CodeCode Available | 5 | 5 |
| Deep Lake: a Lakehouse for Deep Learning | Sep 22, 2022 | Decision MakingDeep Learning | CodeCode Available | 5 | 5 |
| Orbit: A Unified Simulation Framework for Interactive Robot Learning Environments | Jan 10, 2023 | GPUImitation Learning | CodeCode Available | 5 | 5 |
| FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation | Oct 16, 2024 | Audio GenerationGPU | CodeCode Available | 5 | 5 |
| TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second | Jul 5, 2022 | AutoMLBayesian Inference | CodeCode Available | 5 | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 | 5 |
| EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network Design | Feb 1, 2023 | GPUobject-detection | CodeCode Available | 5 | 5 |
| KBLaM: Knowledge Base augmented Language Model | Oct 14, 2024 | 8kGPU | CodeCode Available | 5 | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 | 5 |
| MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Apr 22, 2025 | GPU | CodeCode Available | 5 | 5 |
| Deep Patch Visual SLAM | Aug 3, 2024 | GPUVisual Odometry | CodeCode Available | 4 | 5 |
| LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token | Jan 7, 2025 | GPUVisual Question Answering (VQA) | CodeCode Available | 4 | 5 |
| Building reliable sim driving agents by scaling self-play | Feb 20, 2025 | Autonomous VehiclesBenchmarking | CodeCode Available | 4 | 5 |