| Multi-Stage Speech Bandwidth Extension with Flexible Sampling Rate Control | Jun 4, 2024 | Bandwidth ExtensionCPU | CodeCode Available | 2 |
| Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow | Jun 3, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM | Jun 3, 2024 | DecoderGPU | CodeCode Available | 2 |
| ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation | Jun 3, 2024 | GPUVideo Generation | CodeCode Available | 2 |
| DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention | May 28, 2024 | GPUMamba | CodeCode Available | 2 |
| ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention | May 28, 2024 | GPURepresentation Learning | CodeCode Available | 2 |
| Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations | May 28, 2024 | GPU | CodeCode Available | 2 |
| Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference | May 28, 2024 | GPUText Generation | CodeCode Available | 2 |
| LoQT: Low-Rank Adapters for Quantized Pretraining | May 26, 2024 | GPULanguage Modeling | CodeCode Available | 2 |
| What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions | May 22, 2024 | Data ValuationGPU | CodeCode Available | 2 |