| MCP-MedSAM: A Powerful Lightweight Medical Segment Anything Model Trained with a Single GPU in Just One Day | Dec 8, 2024 | GPUImage Segmentation | CodeCode Available | 1 |
| Batch-Max: Higher LLM Throughput using Larger Batch Sizes and KV Cache Compression | Dec 7, 2024 | GPU | —Unverified | 0 |
| Code generation and runtime techniques for enabling data-efficient deep learning training on GPUs | Dec 6, 2024 | Code GenerationDeep Learning | —Unverified | 0 |
| APOLLO: SGD-like Memory, AdamW-level Performance | Dec 6, 2024 | GPUQuantization | CodeCode Available | 3 |
| Flash Communication: Reducing Tensor Parallelization Bottleneck for Fast Large Language Model Inference | Dec 6, 2024 | GPULanguage Modeling | —Unverified | 0 |
| Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene Reconstruction | Dec 6, 2024 | 3D Reconstruction3D Scene Reconstruction | CodeCode Available | 2 |
| DHIL-GT: Scalable Graph Transformer with Decoupled Hierarchy Labeling | Dec 6, 2024 | GPU | —Unverified | 0 |
| Beyond Boxes: Mask-Guided Spatio-Temporal Feature Aggregation for Video Object Detection | Dec 6, 2024 | GPUMulti-Object Tracking | —Unverified | 0 |
| GUIDE: A Global Unified Inference Engine for Deploying Large Language Models in Heterogeneous Environments | Dec 6, 2024 | GPU | —Unverified | 0 |
| Transformers Can Navigate Mazes With Multi-Step Prediction | Dec 6, 2024 | GPULanguage Modeling | CodeCode Available | 1 |
| DEIM: DETR with Improved Matching for Fast Convergence | Dec 5, 2024 | Data AugmentationGPU | CodeCode Available | 5 |
| SKIM: Any-bit Quantization Pushing The Limits of Post-Training Quantization | Dec 5, 2024 | ClusteringGPU | —Unverified | 0 |
| Assessing and Learning Alignment of Unimodal Vision and Language Models | Dec 5, 2024 | GPUSemantic Segmentation | —Unverified | 0 |
| p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay | Dec 5, 2024 | DecoderGPU | CodeCode Available | 1 |
| Beyond [cls]: Exploring the true potential of Masked Image Modeling representations | Dec 4, 2024 | GPUSelf-Supervised Learning | CodeCode Available | 1 |
| FlashAttention on a Napkin: A Diagrammatic Approach to Deep Learning IO-Awareness | Dec 4, 2024 | GPUQuantization | —Unverified | 0 |
| Diffusion-VLA: Generalizable and Interpretable Robot Foundation Model via Self-Generated Reasoning | Dec 4, 2024 | GPU | —Unverified | 0 |
| Unifying KV Cache Compression for Large Language Models with LeanKV | Dec 4, 2024 | GPUQuantization | —Unverified | 0 |
| CLAP: Unsupervised 3D Representation Learning for Fusion 3D Perception via Curvature Sampling and Prototype Learning | Dec 4, 2024 | GPURepresentation Learning | —Unverified | 0 |
| SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection | Dec 3, 2024 | GPUImage Segmentation | CodeCode Available | 0 |
| Can't Slow me Down: Learning Robust and Hardware-Adaptive Object Detectors against Latency Attacks for Edge Devices | Dec 3, 2024 | Autonomous DrivingGPU | —Unverified | 0 |
| Improving feature interactions at Pinterest under industry constraints | Dec 2, 2024 | GPURecommendation Systems | —Unverified | 0 |
| Quantization-Aware Imitation-Learning for Resource-Efficient Robotic Control | Dec 2, 2024 | Autonomous DrivingDecision Making | —Unverified | 0 |
| MamKPD: A Simple Mamba Baseline for Real-Time 2D Keypoint Detection | Dec 2, 2024 | Animal Pose EstimationGPU | —Unverified | 0 |
| Memory-Efficient Training for Deep Speaker Embedding Learning in Speaker Verification | Dec 2, 2024 | GPUQuantization | —Unverified | 0 |