| Distance-Based Tree-Sliced Wasserstein Distance | Mar 14, 2025 | Computational EfficiencyGPU | CodeCode Available | 0 |
| LLMPerf: GPU Performance Modeling meets Large Language Models | Mar 14, 2025 | GPU | CodeCode Available | 0 |
| Cost-effective Deep Learning Infrastructure with NVIDIA GPU | Mar 14, 2025 | Deep LearningGPU | CodeCode Available | 0 |
| OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models | Mar 13, 2025 | channel selectionContrastive Learning | —Unverified | 0 |
| KV-Distill: Nearly Lossless Learnable Context Compression for LLMs | Mar 13, 2025 | GPUQuestion Answering | —Unverified | 0 |
| Speedy MASt3R | Mar 13, 2025 | 3D Scene ReconstructionGPU | —Unverified | 0 |
| Low Complexity Point Tracking of the Myocardium in 2D Echocardiography | Mar 13, 2025 | GPUPoint Tracking | CodeCode Available | 1 |
| VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers | Mar 12, 2025 | GPUStreaming video understanding | —Unverified | 0 |
| MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based Batching | Mar 12, 2025 | GPU | CodeCode Available | 0 |
| MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics | Mar 12, 2025 | BenchmarkingGPU | —Unverified | 0 |
| Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference | Mar 12, 2025 | BlockingGPU | —Unverified | 0 |
| Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge | Mar 12, 2025 | CPUGPU | —Unverified | 0 |
| TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting | Mar 11, 2025 | GPU | —Unverified | 0 |
| OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models | Mar 11, 2025 | GPUMamba | CodeCode Available | 2 |
| Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference | Mar 11, 2025 | GPU | CodeCode Available | 0 |
| Accelerating MoE Model Inference with Expert Sharding | Mar 11, 2025 | DecoderGPU | —Unverified | 0 |
| LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference Optimization | Mar 11, 2025 | GPUImage Generation | CodeCode Available | 2 |
| Are We There Yet? A Measurement Study of Efficiency for LLM Applications on Mobile Devices | Mar 10, 2025 | CPUGPU | —Unverified | 0 |
| Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data | Mar 10, 2025 | GPU | —Unverified | 0 |
| AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution | Mar 10, 2025 | GPUSuper-Resolution | —Unverified | 0 |
| Short-Term Load Forecasting for AI-Data Center | Mar 10, 2025 | GPULoad Forecasting | —Unverified | 0 |
| AttFC: Attention Fully-Connected Layer for Large-Scale Face Recognition with One GPU | Mar 10, 2025 | Face RecognitionGPU | —Unverified | 0 |
| Efficient Distillation of Classifier-Free Guidance using Adapters | Mar 10, 2025 | GPU | CodeCode Available | 0 |
| Global Context Is All You Need for Parallel Efficient Tractography Parcellation | Mar 10, 2025 | AllData Augmentation | —Unverified | 0 |
| A Mesh Is Worth 512 Numbers: Spectral-domain Diffusion Modeling for High-dimension Shape Generation | Mar 9, 2025 | GPU | —Unverified | 0 |