| Sort-free Gaussian Splatting via Weighted Sum Rendering | Oct 24, 2024 | 3DGS3D Scene Reconstruction | —Unverified | 0 |
| Parameter-Efficient Fine-Tuning in Large Models: A Survey of Methodologies | Oct 24, 2024 | GPUparameter-efficient fine-tuning | —Unverified | 0 |
| POD-Attention: Unlocking Full Prefill-Decode Overlap for Faster LLM Inference | Oct 23, 2024 | GPU | CodeCode Available | 0 |
| ExpertFlow: Optimized Expert Activation and Token Allocation for Efficient Mixture-of-Experts Inference | Oct 23, 2024 | Computational EfficiencyCPU | —Unverified | 0 |
| Is the GPU Half-Empty or Half-Full? Practical Scheduling Techniques for LLMs | Oct 23, 2024 | GPUScheduling | —Unverified | 0 |
| CoreInfer: Accelerating Large Language Model Inference with Semantics-Inspired Adaptive Sparse Activation | Oct 23, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Trajectory Optimization for Spatial Microstructure Control in Electron Beam Metal Additive Manufacturing | Oct 23, 2024 | GPU | —Unverified | 0 |
| FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs | Oct 22, 2024 | CPUGPU | —Unverified | 0 |
| AI-focused HPC Data Centers Can Provide More Power Grid Flexibility and at Lower Cost | Oct 22, 2024 | CPUGPU | —Unverified | 0 |
| Semantic-guided Search for Efficient Program Repair with Large Language Models | Oct 22, 2024 | GPUHumanEval | —Unverified | 0 |