| Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization | Mar 24, 2025 | GPULarge Language Model | —Unverified | 0 |
| WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference | Mar 23, 2025 | GPU | CodeCode Available | 0 |
| Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images | Mar 23, 2025 | Autonomous NavigationDepth Estimation | CodeCode Available | 0 |
| Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability | Mar 21, 2025 | Adversarial RobustnessBayesian Optimization | —Unverified | 0 |
| UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models | Mar 21, 2025 | GPU | —Unverified | 0 |
| Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation | Mar 21, 2025 | GPUScheduling | —Unverified | 0 |
| V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | Mar 21, 2025 | CPUGPU | —Unverified | 0 |
| Temporal Action Detection Model Compression by Progressive Block Drop | Mar 21, 2025 | Action DetectionAutonomous Driving | —Unverified | 0 |
| SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Mar 20, 2025 | CPUGPU | —Unverified | 0 |
| GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting | Mar 20, 2025 | 3DGSGPU | —Unverified | 0 |