| Compress then Serve: Serving Thousands of LoRA Adapters with Little Overhead | Jun 17, 2024 | GPUModel Compression | —Unverified | 0 |
| Optimized Speculative Sampling for GPU Hardware Accelerators | Jun 16, 2024 | Automatic Speech RecognitionGPU | CodeCode Available | 0 |
| Bypass Back-propagation: Optimization-based Structural Pruning for Large Language Models via Policy Gradient | Jun 15, 2024 | GPUNetwork Pruning | —Unverified | 0 |
| CancerLLM: A Large Language Model in Cancer Domain | Jun 15, 2024 | GPULanguage Modeling | —Unverified | 0 |
| A Training-free Sub-quadratic Cost Transformer Model Serving Framework With Hierarchically Pruned Attention | Jun 14, 2024 | GPUQuestion Answering | —Unverified | 0 |
| Practical offloading for fine-tuning LLM on commodity GPU via learned sparse projectors | Jun 14, 2024 | CPUGPU | CodeCode Available | 0 |
| Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics | Jun 14, 2024 | Combinatorial OptimizationCPU | CodeCode Available | 0 |
| PixRO: Pixel-Distributed Rotational Odometry with Gaussian Belief Propagation | Jun 14, 2024 | CPUGPU | —Unverified | 0 |
| Cognitively Inspired Energy-Based World Models | Jun 13, 2024 | GPU | —Unverified | 0 |
| WonderWorld: Interactive 3D Scene Generation from a Single Image | Jun 13, 2024 | Depth EstimationGPU | —Unverified | 0 |