| An Experimental Study of SOTA LiDAR Segmentation Models | Feb 18, 2025 | GPUMotion Compensation | —Unverified | 0 |
| Multimodal Mamba: Decoder-only Multimodal State Space Model via Quadratic to Linear Distillation | Feb 18, 2025 | DecoderGPU | CodeCode Available | 2 |
| HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading | Feb 18, 2025 | Computational EfficiencyCPU | CodeCode Available | 2 |
| BaKlaVa -- Budgeted Allocation of KV cache for Long-context Inference | Feb 18, 2025 | GPULanguage Modeling | —Unverified | 0 |
| GPU Memory Usage Optimization for Backward Propagation in Deep Network Training | Feb 18, 2025 | GPU | —Unverified | 0 |
| Myna: Masking-Based Contrastive Learning of Musical Representations | Feb 18, 2025 | Contrastive LearningData Augmentation | CodeCode Available | 1 |
| SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings | Feb 18, 2025 | GPUSafety Alignment | CodeCode Available | 0 |
| Rotate, Clip, and Partition: Towards W2A4KV4 Quantization by Integrating Rotation and Learnable Non-uniform Quantizer | Feb 17, 2025 | GPUQuantization | —Unverified | 0 |
| Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer Gate | Feb 17, 2025 | GPUMixture-of-Experts | CodeCode Available | 0 |
| AdaSplash: Adaptive Sparse Flash Attention | Feb 17, 2025 | GPULanguage Modeling | CodeCode Available | 1 |