| Stochastic Engrams for Efficient Continual Learning with Binarized Neural Networks | Mar 27, 2025 | Computational EfficiencyContinual Learning | —Unverified | 0 |
| Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time | Mar 27, 2025 | CPUGPU | —Unverified | 0 |
| Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding | Mar 26, 2025 | GPUQuestion Answering | —Unverified | 0 |
| High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching | Mar 26, 2025 | GPUImage Generation | —Unverified | 0 |
| AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary Adaptation | Mar 25, 2025 | Domain AdaptationGPU | CodeCode Available | 0 |
| Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification | Mar 25, 2025 | Breast Cancer DetectionGPU | —Unverified | 0 |
| PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch | Mar 25, 2025 | CPUGPU | —Unverified | 0 |
| Improved Alignment of Modalities in Large Vision Language Models | Mar 25, 2025 | GPUImage Captioning | —Unverified | 0 |
| Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding | Mar 24, 2025 | 8kGPU | —Unverified | 0 |
| GRiNS: A Python Library for Simulating Gene Regulatory Network Dynamics | Mar 24, 2025 | GPU | CodeCode Available | 0 |
| Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization | Mar 24, 2025 | GPULarge Language Model | —Unverified | 0 |
| WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM Inference | Mar 23, 2025 | GPU | CodeCode Available | 0 |
| Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial Images | Mar 23, 2025 | Autonomous NavigationDepth Estimation | CodeCode Available | 0 |
| V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms | Mar 21, 2025 | CPUGPU | —Unverified | 0 |
| Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability | Mar 21, 2025 | Adversarial RobustnessBayesian Optimization | —Unverified | 0 |
| Temporal Action Detection Model Compression by Progressive Block Drop | Mar 21, 2025 | Action DetectionAutonomous Driving | —Unverified | 0 |
| Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation | Mar 21, 2025 | GPUScheduling | —Unverified | 0 |
| UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models | Mar 21, 2025 | GPU | —Unverified | 0 |
| SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs | Mar 20, 2025 | CPUGPU | —Unverified | 0 |
| GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting | Mar 20, 2025 | 3DGSGPU | —Unverified | 0 |
| ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming | Mar 19, 2025 | GPU | —Unverified | 0 |
| Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection | Mar 19, 2025 | Anomaly DetectionFederated Learning | —Unverified | 0 |
| TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection | Mar 18, 2025 | GPUobject-detection | —Unverified | 0 |
| Bolt3D: Generating 3D Scenes in Seconds | Mar 18, 2025 | 3D geometry3D Reconstruction | —Unverified | 0 |
| Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation | Mar 18, 2025 | 3DGSGPU | —Unverified | 0 |
| ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning | Mar 17, 2025 | GPUModel Compression | —Unverified | 0 |
| Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory | Mar 17, 2025 | FormGPU | —Unverified | 0 |
| MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis | Mar 17, 2025 | GPU | —Unverified | 0 |
| AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications | Mar 17, 2025 | ChunkingGPU | —Unverified | 0 |
| Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs | Mar 15, 2025 | GPU | —Unverified | 0 |
| PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices | Mar 15, 2025 | GPUScheduling | —Unverified | 0 |
| Characterizing GPU Resilience and Impact on AI/HPC Systems | Mar 14, 2025 | AttributeGPU | —Unverified | 0 |
| Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers | Mar 14, 2025 | GPUMamba | —Unverified | 0 |
| Distance-Based Tree-Sliced Wasserstein Distance | Mar 14, 2025 | Computational EfficiencyGPU | CodeCode Available | 0 |
| X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression | Mar 14, 2025 | GPU | —Unverified | 0 |
| LLMPerf: GPU Performance Modeling meets Large Language Models | Mar 14, 2025 | GPU | CodeCode Available | 0 |
| Cost-effective Deep Learning Infrastructure with NVIDIA GPU | Mar 14, 2025 | Deep LearningGPU | CodeCode Available | 0 |
| OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models | Mar 13, 2025 | channel selectionContrastive Learning | —Unverified | 0 |
| KV-Distill: Nearly Lossless Learnable Context Compression for LLMs | Mar 13, 2025 | GPUQuestion Answering | —Unverified | 0 |
| Speedy MASt3R | Mar 13, 2025 | 3D Scene ReconstructionGPU | —Unverified | 0 |
| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | Mar 12, 2025 | GPU | CodeCode Available | 0 |
| Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mar 12, 2025 | BlockingGPU | —Unverified | 0 |
| Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge | Mar 12, 2025 | CPUGPU | —Unverified | 0 |
| VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers | Mar 12, 2025 | GPUStreaming video understanding | —Unverified | 0 |
| MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics | Mar 12, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Mar 11, 2025 | GPU | CodeCode Available | 0 |
| Accelerating MoE Model Inference with Expert Sharding | Mar 11, 2025 | DecoderGPU | —Unverified | 0 |
| TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting | Mar 11, 2025 | GPU | —Unverified | 0 |
| AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution | Mar 10, 2025 | GPUSuper-Resolution | —Unverified | 0 |
| Global Context Is All You Need for Parallel Efficient Tractography Parcellation | Mar 10, 2025 | AllData Augmentation | —Unverified | 0 |