| Fast inference with Kronecker-sparse matrices | May 23, 2024 | GPUManagement | CodeCode Available | 1 |
| Attention as an RNN | May 22, 2024 | GPUTime Series | CodeCode Available | 1 |
| PyramidInfer: Pyramid KV Cache Compression for High-throughput LLM Inference | May 21, 2024 | GPU | CodeCode Available | 1 |
| Token-wise Influential Training Data Retrieval for Large Language Models | May 20, 2024 | CPUGPU | CodeCode Available | 1 |
| Hybrid CNN-Transformer Architecture for Efficient Large-Scale Video Snapshot Compressive Imaging | May 19, 2024 | GPU | CodeCode Available | 1 |
| HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models | May 16, 2024 | GPULanguage Modelling | CodeCode Available | 1 |
| No Time to Waste: Squeeze Time into Channel for Mobile Video Understanding | May 14, 2024 | Action DetectionGPU | CodeCode Available | 1 |
| The Developing Human Connectome Project: A Fast Deep Learning-based Pipeline for Neonatal Cortical Surface Reconstruction | May 14, 2024 | GPUSurface Reconstruction | CodeCode Available | 1 |
| Computation-Aware Kalman Filtering and Smoothing | May 14, 2024 | GPU | CodeCode Available | 1 |
| Differentiable Model Scaling using Differentiable Topk | May 12, 2024 | GPUimage-classification | CodeCode Available | 1 |