| PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices | Mar 15, 2025 | GPUScheduling | —Unverified | 0 |
| LLMPerf: GPU Performance Modeling meets Large Language Models | Mar 14, 2025 | GPU | CodeCode Available | 0 |
| Characterizing GPU Resilience and Impact on AI/HPC Systems | Mar 14, 2025 | AttributeGPU | —Unverified | 0 |
| X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression | Mar 14, 2025 | GPU | —Unverified | 0 |
| Distance-Based Tree-Sliced Wasserstein Distance | Mar 14, 2025 | Computational EfficiencyGPU | CodeCode Available | 0 |
| Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers | Mar 14, 2025 | GPUMamba | —Unverified | 0 |
| Cost-effective Deep Learning Infrastructure with NVIDIA GPU | Mar 14, 2025 | Deep LearningGPU | CodeCode Available | 0 |
| Speedy MASt3R | Mar 13, 2025 | 3D Scene ReconstructionGPU | —Unverified | 0 |
| OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models | Mar 13, 2025 | channel selectionContrastive Learning | —Unverified | 0 |
| KV-Distill: Nearly Lossless Learnable Context Compression for LLMs | Mar 13, 2025 | GPUQuestion Answering | —Unverified | 0 |