| vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | May 7, 2024 | GPUManagement | CodeCode Available | 3 |
| A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields | May 7, 2024 | GPUobject-detection | —Unverified | 0 |
| DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid | May 7, 2024 | GPUIndoor Scene Reconstruction | —Unverified | 0 |
| Open Implementation and Study of BEST-RQ for Speech Processing | May 7, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization | May 7, 2024 | GPULanguage Modeling | —Unverified | 0 |
| QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | May 7, 2024 | GPULanguage Modelling | CodeCode Available | 4 |
| Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression | May 7, 2024 | GPUImage Compression | —Unverified | 0 |
| Neural Graphics Texture Compression Supporting Random Access | May 6, 2024 | GPUImage Compression | —Unverified | 0 |
| QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation | May 6, 2024 | GPU | —Unverified | 0 |
| Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs | May 5, 2024 | GPULanguage Modelling | —Unverified | 0 |