| DeepSeek-V3 Technical Report | Dec 27, 2024 | GPULanguage Modeling | CodeCode Available | 16 | 5 |
| SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Jun 2, 2025 | Action GenerationGPU | CodeCode Available | 11 | 5 |
| FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | Jul 11, 2024 | GPUQuantization | CodeCode Available | 11 | 5 |
| LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jul 3, 2024 | Computational EfficiencyFace Reenactment | CodeCode Available | 11 | 5 |
| WebLLM: A High-Performance In-Browser LLM Inference Engine | Dec 20, 2024 | CPUGPU | CodeCode Available | 11 | 5 |
| MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm | Jun 5, 2025 | GPURelation | CodeCode Available | 9 | 5 |
| MuseTalk: Real-Time High-Fidelity Video Dubbing via Spatio-Temporal Sampling | Oct 14, 2024 | Audio-Visual SynchronizationGPU | CodeCode Available | 9 | 5 |
| SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers | Oct 14, 2024 | DecoderGPU | CodeCode Available | 9 | 5 |
| MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention | Jul 2, 2024 | GPULanguage Modelling | CodeCode Available | 9 | 5 |
| TorchTitan: One-stop PyTorch native solution for production ready LLM pre-training | Oct 9, 2024 | GPU | CodeCode Available | 9 | 5 |
| FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Jan 2, 2025 | GPUScheduling | CodeCode Available | 9 | 5 |
| Depth Pro: Sharp Monocular Metric Depth in Less Than a Second | Oct 2, 2024 | Depth EstimationGPU | CodeCode Available | 9 | 5 |
| Liger Kernel: Efficient Triton Kernels for LLM Training | Oct 14, 2024 | ChunkingGPU | CodeCode Available | 9 | 5 |
| LTX-Video: Realtime Video Latent Diffusion | Dec 30, 2024 | DenoisingGPU | CodeCode Available | 9 | 5 |
| Divide and Conquer: High-Resolution Industrial Anomaly Detection via Memory Efficient Tiled Ensemble | Mar 7, 2024 | Anomaly DetectionGPU | CodeCode Available | 9 | 5 |
| LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning | Mar 26, 2024 | GPUGSM8K | CodeCode Available | 9 | 5 |
| PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction | Mar 21, 2025 | CPUDocument Layout Analysis | CodeCode Available | 9 | 5 |
| DETRs Beat YOLOs on Real-time Object Detection | Apr 17, 2023 | 2D Object DetectionDecoder | CodeCode Available | 8 | 5 |
| Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving | Jun 24, 2024 | CPUGPU | CodeCode Available | 7 | 5 |
| D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement | Oct 17, 2024 | GPUReal-Time Object Detection | CodeCode Available | 7 | 5 |
| YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors | Jul 6, 2022 | 2D Object DetectionGPU | CodeCode Available | 7 | 5 |
| YOLOv12: Attention-Centric Real-Time Object Detectors | Feb 18, 2025 | GPUObject | CodeCode Available | 7 | 5 |
| xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism | Nov 4, 2024 | GPU | CodeCode Available | 7 | 5 |
| AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning | May 30, 2025 | GPUMath | CodeCode Available | 7 | 5 |
| Fast Timing-Conditioned Latent Audio Diffusion | Feb 7, 2024 | Audio GenerationGPU | CodeCode Available | 7 | 5 |
| ThunderKittens: Simple, Fast, and Adorable AI Kernels | Oct 27, 2024 | GPUState Space Models | CodeCode Available | 7 | 5 |
| FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving | Nov 27, 2024 | FairnessGPU | CodeCode Available | 7 | 5 |
| Mirage: A Multi-Level Superoptimizer for Tensor Programs | May 9, 2024 | GPUNavigate | CodeCode Available | 7 | 5 |
| ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI | Oct 1, 2024 | GPUImitation Learning | CodeCode Available | 7 | 5 |
| Scalable MatMul-free Language Modeling | Jun 4, 2024 | GPULanguage Modeling | CodeCode Available | 7 | 5 |
| EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement Learning | Jan 25, 2025 | BenchmarkingEvolutionary Algorithms | CodeCode Available | 7 | 5 |
| Revisiting PCA for time series reduction in temporal dimension | Dec 27, 2024 | Computational EfficiencyDimensionality Reduction | CodeCode Available | 7 | 5 |
| Elixir: Train a Large Language Model on a Small GPU Cluster | Dec 10, 2022 | CPUGPU | CodeCode Available | 7 | 5 |
| Pyramidal Flow Matching for Efficient Video Generative Modeling | Oct 8, 2024 | GPUText-to-Video Generation | CodeCode Available | 7 | 5 |
| GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers | Oct 31, 2022 | GPULanguage Modelling | CodeCode Available | 7 | 5 |
| Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis | May 14, 2025 | DenoisingDepth Estimation | CodeCode Available | 7 | 5 |
| Labeling supervised fine-tuning data with the scaling law | May 5, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 7 | 5 |
| Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via Tensorization | Mar 26, 2025 | CPUGPU | CodeCode Available | 7 | 5 |
| EvoGP: A GPU-accelerated Framework for Tree-based Genetic Programming | Jan 21, 2025 | Feature EngineeringGPU | CodeCode Available | 7 | 5 |
| PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models | Jan 10, 2024 | GPUImage Generation | CodeCode Available | 7 | 5 |
| FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | May 27, 2022 | 16k4k | CodeCode Available | 6 | 5 |
| LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models | Sep 21, 2023 | 4kGPU | CodeCode Available | 6 | 5 |
| SqueezeLLM: Dense-and-Sparse Quantization | Jun 13, 2023 | GPUQuantization | CodeCode Available | 6 | 5 |
| FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | Jul 17, 2023 | GPULanguage Modeling | CodeCode Available | 6 | 5 |
| QLoRA: Efficient Finetuning of Quantized LLMs | May 23, 2023 | ChatbotGPU | CodeCode Available | 6 | 5 |
| AudioLCM: Text-to-Audio Generation with Latent Consistency Models | Jun 1, 2024 | Audio GenerationAudio Synthesis | CodeCode Available | 5 | 5 |
| MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Apr 22, 2025 | GPU | CodeCode Available | 5 | 5 |
| MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models | Aug 21, 2024 | GPUQuantization | CodeCode Available | 5 | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 | 5 |
| Deep Lake: a Lakehouse for Deep Learning | Sep 22, 2022 | Decision MakingDeep Learning | CodeCode Available | 5 | 5 |