| H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models | Jun 24, 2023 | GPU | CodeCode Available | 2 | 5 |
| Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | May 28, 2024 | GPUText Generation | CodeCode Available | 2 | 5 |
| Accelerating Sparse Deep Neural Networks | Apr 16, 2021 | GPUMath | CodeCode Available | 2 | 5 |
| AutoFocus: Efficient Multi-Scale Inference | Dec 4, 2018 | GPU | CodeCode Available | 2 | 5 |
| Deep Snake for Real-Time Instance Segmentation | Jan 6, 2020 | GPUInstance Segmentation | CodeCode Available | 2 | 5 |
| PocketVina Enables Scalable and Highly Accurate Physically Valid Docking through Multi-Pocket Conditioning | Jun 24, 2025 | BenchmarkingDrug Discovery | CodeCode Available | 2 | 5 |
| Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers | May 20, 2025 | GPUVideo Generation | CodeCode Available | 2 | 5 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 | 5 |
| HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image Priors | Jul 26, 2024 | Depth EstimationGPU | CodeCode Available | 2 | 5 |
| GPU Performance Portability needs Autotuning | Apr 30, 2025 | GPU | CodeCode Available | 2 | 5 |