| Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection | Dec 6, 2024 | GPUMulti-Object Tracking | —Unverified | 0 |
| GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments | Dec 6, 2024 | GPU | —Unverified | 0 |
| Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference | Dec 6, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Transformers Can Navigate Mazes With Multi-Step Prediction | Dec 6, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling | Dec 6, 2024 | GPU | —Unverified | 0 |
| SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization | Dec 5, 2024 | ClusteringGPU | —Unverified | 0 |
| p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Dec 5, 2024 | DecoderGPU | CodeCode Available | 1 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 |
| Assessing and Learning Alignment of Unimodal Vision and Language Models | Dec 5, 2024 | GPUSemantic Segmentation | —Unverified | 0 |
| FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness | Dec 4, 2024 | GPUQuantization | —Unverified | 0 |