| Mirage: A Multi-Level Superoptimizer for Tensor Programs | May 9, 2024 | GPUNavigate | CodeCode Available | 7 |
| Preble: Efficient Distributed Prompt Scheduling for LLM Serving | May 8, 2024 | GPUScheduling | CodeCode Available | 2 |
| You Only Cache Once: Decoder-Decoder Architectures for Language Models | May 8, 2024 | DecoderGPU | CodeCode Available | 0 |
| Vidur: A Large-Scale Simulation Framework For LLM Inference | May 8, 2024 | CPUGPU | CodeCode Available | 4 |
| Open Implementation and Study of BEST-RQ for Speech Processing | May 7, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields | May 7, 2024 | GPUobject-detection | —Unverified | 0 |
| Group-aware Parameter-efficient Updating for Content-Adaptive Neural Video Compression | May 7, 2024 | GPUImage Compression | —Unverified | 0 |
| DistGrid: Scalable Scene Reconstruction with Distributed Multi-resolution Hash Grid | May 7, 2024 | GPUIndoor Scene Reconstruction | —Unverified | 0 |
| QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | May 7, 2024 | GPULanguage Modelling | CodeCode Available | 4 |
| vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention | May 7, 2024 | GPUManagement | CodeCode Available | 3 |
| KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization | May 7, 2024 | GPULanguage Modeling | —Unverified | 0 |
| SwiftRL: Towards Efficient Reinforcement Learning on Real Processing-In-Memory Systems | May 7, 2024 | CPUGPU | CodeCode Available | 0 |
| Neural Graphics Texture Compression Supporting Random Access | May 6, 2024 | GPUImage Compression | —Unverified | 0 |
| QuadraNet V2: Efficient and Sustainable Training of High-Order Neural Networks with Quadratic Adaptation | May 6, 2024 | GPU | —Unverified | 0 |
| Get more for less: Principled Data Selection for Warming Up Fine-Tuning in LLMs | May 5, 2024 | GPULanguage Modelling | —Unverified | 0 |
| Labeling supervised fine-tuning data with the scaling law | May 5, 2024 | coreference-resolutionCoreference Resolution | CodeCode Available | 7 |
| UniDEC : Unified Dual Encoder and Classifier Training for Extreme Multi-Label Classification | May 4, 2024 | Extreme Multi-Label ClassificationGPU | —Unverified | 0 |
| Fast Algorithms for Spiking Neural Network Simulation with FPGAs | May 3, 2024 | GPUHigh-Level Synthesis | CodeCode Available | 0 |
| SoftMCL: Soft Momentum Contrastive Learning for Fine-grained Sentiment-aware Pre-training | May 3, 2024 | Contrastive LearningGPU | CodeCode Available | 0 |
| Structural Pruning of Pre-trained Language Models via Neural Architecture Search | May 3, 2024 | GPUNatural Language Understanding | CodeCode Available | 0 |
| MTDT: A Multi-Task Deep Learning Digital Twin | May 2, 2024 | Deep LearningGPU | —Unverified | 0 |
| Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers | May 2, 2024 | GPUSelf-Supervised Learning | —Unverified | 0 |
| FeNNol: an Efficient and Flexible Library for Building Force-field-enhanced Neural Network Potentials | May 2, 2024 | GPU | CodeCode Available | 2 |
| Deep Learning Models in Speech Recognition: Measuring GPU Energy Consumption, Impact of Noise and Model Quantization for Edge Deployment | May 2, 2024 | GPUNVIDIA Jetson Orin Nano | CodeCode Available | 0 |
| Addressing Diverging Training Costs using BEVRestore for High-resolution Bird's Eye View Map Construction | May 2, 2024 | Collision AvoidanceGPU | —Unverified | 0 |