| Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability | Mar 21, 2025 | Adversarial RobustnessBayesian Optimization | —Unverified | 0 |
| GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting | Mar 20, 2025 | 3DGSGPU | —Unverified | 0 |
| SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Mar 20, 2025 | CPUGPU | —Unverified | 0 |
| DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding | Mar 20, 2025 | GPU | CodeCode Available | 2 |
| ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming | Mar 19, 2025 | GPU | —Unverified | 0 |
| Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection | Mar 19, 2025 | Anomaly DetectionFederated Learning | —Unverified | 0 |
| Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels | Mar 18, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks | Mar 18, 2025 | GPUparameter-efficient fine-tuning | CodeCode Available | 1 |
| Bolt3D: Generating 3D Scenes in Seconds | Mar 18, 2025 | 3D geometry3D Reconstruction | —Unverified | 0 |
| DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation | Mar 18, 2025 | DenoisingGPU | CodeCode Available | 1 |
| TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection | Mar 18, 2025 | GPUobject-detection | —Unverified | 0 |
| Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation | Mar 18, 2025 | 3DGSGPU | —Unverified | 0 |
| MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis | Mar 17, 2025 | GPU | —Unverified | 0 |
| ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning | Mar 17, 2025 | GPUModel Compression | —Unverified | 0 |
| AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Mar 17, 2025 | ChunkingGPU | —Unverified | 0 |
| Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory | Mar 17, 2025 | FormGPU | —Unverified | 0 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds | Mar 16, 2025 | GPU | CodeCode Available | 2 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 |
| PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices | Mar 15, 2025 | GPUScheduling | —Unverified | 0 |
| Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs | Mar 15, 2025 | GPU | —Unverified | 0 |
| X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression | Mar 14, 2025 | GPU | —Unverified | 0 |
| APLA: A Simple Adaptation Method for Vision Transformers | Mar 14, 2025 | ClassificationGPU | CodeCode Available | 1 |
| Characterizing GPU Resilience and Impact on AI/HPC Systems | Mar 14, 2025 | AttributeGPU | —Unverified | 0 |
| Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers | Mar 14, 2025 | GPUMamba | —Unverified | 0 |