| LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models | Sep 21, 2023 | 4kGPU | CodeCode Available | 6 |
| FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning | Jul 17, 2023 | GPULanguage Modeling | CodeCode Available | 6 |
| SqueezeLLM: Dense-and-Sparse Quantization | Jun 13, 2023 | GPUQuantization | CodeCode Available | 6 |
| QLoRA: Efficient Finetuning of Quantized LLMs | May 23, 2023 | ChatbotGPU | CodeCode Available | 6 |
| FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | May 27, 2022 | 16k4k | CodeCode Available | 6 |
| Group-in-Group Policy Optimization for LLM Agent Training | May 16, 2025 | GPUMathematical Reasoning | CodeCode Available | 5 |
| MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention | Apr 22, 2025 | GPU | CodeCode Available | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 |
| Representing Long Volumetric Video with Temporal Gaussian Hierarchy | Dec 12, 2024 | GPU | CodeCode Available | 5 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 |