| Fused3S: Fast Sparse Attention on Tensor Cores | May 12, 2025 | GPU | CodeCode Available | 0 |
| Cache-Efficient Posterior Sampling for Reinforcement Learning with LLM-Derived Priors Across Discrete and Continuous Domains | May 12, 2025 | continuous-controlContinuous Control | —Unverified | 0 |
| L-SWAG: Layer-Sample Wise Activation with Gradients information for Zero-Shot NAS on Vision Transformers | May 12, 2025 | GPUNeural Architecture Search | —Unverified | 0 |
| SLAG: Scalable Language-Augmented Gaussian Splatting | May 12, 2025 | GPULanguage Modeling | —Unverified | 0 |
| Private LoRA Fine-tuning of Open-Source LLMs with Homomorphic Encryption | May 12, 2025 | GPUKnowledge Base Question Answering | —Unverified | 0 |
| Streaming Krylov-Accelerated Stochastic Gradient Descent | May 11, 2025 | GPUStochastic Optimization | —Unverified | 0 |
| Matrix Is All You Need | May 11, 2025 | AllGPU | —Unverified | 0 |
| QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration | May 10, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference | May 9, 2025 | CPUGPU | —Unverified | 0 |
| FloE: On-the-Fly MoE Inference on Memory-constrained GPU | May 9, 2025 | CPUGPU | —Unverified | 0 |