| Splat-LOAM: Gaussian Splatting LiDAR Odometry and Mapping | Mar 21, 2025 | GPUMotion Estimation | CodeCode Available | 2 |
| GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting | Mar 20, 2025 | 3DGSGPU | —Unverified | 0 |
| SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Mar 20, 2025 | CPUGPU | —Unverified | 0 |
| DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding | Mar 20, 2025 | GPU | CodeCode Available | 2 |
| ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming | Mar 19, 2025 | GPU | —Unverified | 0 |
| Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection | Mar 19, 2025 | Anomaly DetectionFederated Learning | —Unverified | 0 |
| Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels | Mar 18, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| Empowering Smaller Models: Tuning LLaMA and Gemma with Chain-of-Thought for Ukrainian Exam Tasks | Mar 18, 2025 | GPUparameter-efficient fine-tuning | CodeCode Available | 1 |
| Bolt3D: Generating 3D Scenes in Seconds | Mar 18, 2025 | 3D geometry3D Reconstruction | —Unverified | 0 |
| TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection | Mar 18, 2025 | GPUobject-detection | —Unverified | 0 |
| DIFFVSGG: Diffusion-Driven Online Video Scene Graph Generation | Mar 18, 2025 | DenoisingGPU | CodeCode Available | 1 |
| Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation | Mar 18, 2025 | 3DGSGPU | —Unverified | 0 |
| MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis | Mar 17, 2025 | GPU | —Unverified | 0 |
| ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning | Mar 17, 2025 | GPUModel Compression | —Unverified | 0 |
| AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Mar 17, 2025 | ChunkingGPU | —Unverified | 0 |
| Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory | Mar 17, 2025 | FormGPU | —Unverified | 0 |
| MaTVLM: Hybrid Mamba-Transformer for Efficient Vision-Language Modeling | Mar 17, 2025 | GPULanguage Modeling | CodeCode Available | 2 |
| RENO: Real-Time Neural Compression for 3D LiDAR Point Clouds | Mar 16, 2025 | GPU | CodeCode Available | 2 |
| ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU Memory | Mar 16, 2025 | CPUGPU | CodeCode Available | 3 |
| PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices | Mar 15, 2025 | GPUScheduling | —Unverified | 0 |
| Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs | Mar 15, 2025 | GPU | —Unverified | 0 |
| X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression | Mar 14, 2025 | GPU | —Unverified | 0 |
| APLA: A Simple Adaptation Method for Vision Transformers | Mar 14, 2025 | ClassificationGPU | CodeCode Available | 1 |
| Characterizing GPU Resilience and Impact on AI/HPC Systems | Mar 14, 2025 | AttributeGPU | —Unverified | 0 |
| Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers | Mar 14, 2025 | GPUMamba | —Unverified | 0 |
| Distance-Based Tree-Sliced Wasserstein Distance | Mar 14, 2025 | Computational EfficiencyGPU | CodeCode Available | 0 |
| LLMPerf: GPU Performance Modeling meets Large Language Models | Mar 14, 2025 | GPU | CodeCode Available | 0 |
| Cost-effective Deep Learning Infrastructure with NVIDIA GPU | Mar 14, 2025 | Deep LearningGPU | CodeCode Available | 0 |
| OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models | Mar 13, 2025 | channel selectionContrastive Learning | —Unverified | 0 |
| KV-Distill: Nearly Lossless Learnable Context Compression for LLMs | Mar 13, 2025 | GPUQuestion Answering | —Unverified | 0 |
| Speedy MASt3R | Mar 13, 2025 | 3D Scene ReconstructionGPU | —Unverified | 0 |
| Low Complexity Point Tracking of the Myocardium in 2D Echocardiography | Mar 13, 2025 | GPUPoint Tracking | CodeCode Available | 1 |
| VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers | Mar 12, 2025 | GPUStreaming video understanding | —Unverified | 0 |
| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | Mar 12, 2025 | GPU | CodeCode Available | 0 |
| MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics | Mar 12, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mar 12, 2025 | BlockingGPU | —Unverified | 0 |
| Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge | Mar 12, 2025 | CPUGPU | —Unverified | 0 |
| TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting | Mar 11, 2025 | GPU | —Unverified | 0 |
| OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Mar 11, 2025 | GPUMamba | CodeCode Available | 2 |
| Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Mar 11, 2025 | GPU | CodeCode Available | 0 |
| Accelerating MoE Model Inference with Expert Sharding | Mar 11, 2025 | DecoderGPU | —Unverified | 0 |
| LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Mar 11, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices | Mar 10, 2025 | CPUGPU | —Unverified | 0 |
| Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data | Mar 10, 2025 | GPU | —Unverified | 0 |
| AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution | Mar 10, 2025 | GPUSuper-Resolution | —Unverified | 0 |
| Short-Term Load Forecasting for AI-Data Center | Mar 10, 2025 | GPULoad Forecasting | —Unverified | 0 |
| AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU | Mar 10, 2025 | Face RecognitionGPU | —Unverified | 0 |
| Efficient Distillation of Classifier-Free Guidance using Adapters | Mar 10, 2025 | GPU | CodeCode Available | 0 |
| Global Context Is All You Need for Parallel Efficient Tractography Parcellation | Mar 10, 2025 | AllData Augmentation | —Unverified | 0 |
| A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation | Mar 9, 2025 | GPU | —Unverified | 0 |