| Slamming: Training a Speech Language Model on One GPU in a Day | Feb 19, 2025 | GPULanguage Modeling | CodeCode Available | 3 |
| RocketKV: Accelerating Long-Context LLM Inference via Two-Stage KV Cache Compression | Feb 19, 2025 | GPU | —Unverified | 0 |
| LSR-Adapt: Ultra-Efficient Parameter Tuning with Matrix Low Separation Rank Kernel Adaptation | Feb 19, 2025 | GPUparameter-efficient fine-tuning | —Unverified | 0 |
| Activation-aware Probe-Query: Effective Key-Value Retrieval for Long-Context LLMs Inference | Feb 19, 2025 | GPURetrieval | —Unverified | 0 |
| Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs | Feb 19, 2025 | GPU | —Unverified | 0 |
| GPU-Friendly Laplacian Texture Blending | Feb 19, 2025 | GPU | —Unverified | 0 |
| MEX: Memory-efficient Approach to Referring Multi-Object Tracking | Feb 19, 2025 | Autonomous DrivingGPU | —Unverified | 0 |
| SPPD: Self-training with Process Preference Learning Using Dynamic Value Margin | Feb 19, 2025 | GPULogical Reasoning | —Unverified | 0 |
| Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models | Feb 19, 2025 | GPUQuantization | CodeCode Available | 2 |
| YOLOv12: Attention-Centric Real-Time Object Detectors | Feb 18, 2025 | GPUObject | CodeCode Available | 7 |