| DeepSeek-V3 Technical Report | Dec 27, 2024 | GPULanguage Modeling | CodeCode Available | 16 |
| SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics | Jun 2, 2025 | Action GenerationGPU | CodeCode Available | 11 |
| WebLLM: A High-Performance In-Browser LLM Inference Engine | Dec 20, 2024 | CPUGPU | CodeCode Available | 11 |
| FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision | Jul 11, 2024 | GPUQuantization | CodeCode Available | 11 |
| LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control | Jul 3, 2024 | Computational EfficiencyFace Reenactment | CodeCode Available | 11 |
| MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm | Jun 5, 2025 | GPURelation | CodeCode Available | 9 |
| PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction | Mar 21, 2025 | CPUDocument Layout Analysis | CodeCode Available | 9 |
| FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving | Jan 2, 2025 | GPUScheduling | CodeCode Available | 9 |
| LTX-Video: Realtime Video Latent Diffusion | Dec 30, 2024 | DenoisingGPU | CodeCode Available | 9 |
| Liger Kernel: Efficient Triton Kernels for LLM Training | Oct 14, 2024 | ChunkingGPU | CodeCode Available | 9 |