| SLOs-Serve: Optimized Serving of Multi-SLO LLMs | Apr 5, 2025 | ChatbotGPU | —Unverified | 0 |
| DeepOHeat-v1: Efficient Operator Learning for Fast and Trustworthy Thermal Simulation and Optimization in 3D-IC Design | Apr 4, 2025 | GPUKolmogorov-Arnold Networks | CodeCode Available | 0 |
| Meta-DAN: towards an efficient prediction strategy for page-level handwritten text recognition | Apr 4, 2025 | GPUHandwritten Text Recognition | CodeCode Available | 1 |
| HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs | Apr 4, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Accurate GPU Memory Prediction for Deep Learning Jobs through Dynamic Analysis | Apr 4, 2025 | CPUGPU | —Unverified | 0 |
| Scaling Video-Language Models to 10K Frames via Hierarchical Differential Distillation | Apr 3, 2025 | Computational EfficiencyGPU | CodeCode Available | 2 |
| MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | Apr 3, 2025 | CPUGPU | —Unverified | 0 |
| Incorporating the ChEES Criterion into Sequential Monte Carlo Samplers | Apr 3, 2025 | Bayesian InferenceGPU | —Unverified | 0 |
| GPTAQ: Efficient Finetuning-Free Quantization for Asymmetric Calibration | Apr 3, 2025 | GPUQuantization | CodeCode Available | 2 |
| A Truncated Newton Method for Optimal Transport | Apr 2, 2025 | GPU | CodeCode Available | 0 |