SOTAVerified

GPU

Papers

Showing 19512000 of 5629 papers

TitleStatusHype
Stochastic Engrams for Efficient Continual Learning with Binarized Neural Networks0
Robust DNN Partitioning and Resource Allocation Under Uncertain Inference Time0
Self-ReS: Self-Reflection in Large Vision-Language Models for Long Video Understanding0
High Quality Diffusion Distillation on a Single GPU with Relative and Absolute Position Matching0
AdaptiVocab: Enhancing LLM Efficiency in Focused Domains through Lightweight Vocabulary AdaptationCode0
Optimizing Breast Cancer Detection in Mammograms: A Comprehensive Study of Transfer Learning, Resolution Reduction, and Multi-View Classification0
PyGraph: Robust Compiler Support for CUDA Graphs in PyTorch0
Improved Alignment of Modalities in Large Vision Language Models0
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding0
GRiNS: A Python Library for Simulating Gene Regulatory Network DynamicsCode0
Oaken: Fast and Efficient LLM Serving with Online-Offline Hybrid KV Cache Quantization0
WindowKV: Task-Adaptive Group-Wise KV Cache Window Selection for Efficient LLM InferenceCode0
Co-SemDepth: Fast Joint Semantic Segmentation and Depth Estimation on Aerial ImagesCode0
V-Seek: Accelerating LLM Reasoning on Open-hardware Server-class RISC-V Platforms0
Robustness of deep learning classification to adversarial input on GPUs: asynchronous parallel accumulation is a source of vulnerability0
Temporal Action Detection Model Compression by Progressive Block Drop0
Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation0
UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models0
SpeCache: Speculative Key-Value Caching for Efficient Generation of LLMs0
GauRast: Enhancing GPU Triangle Rasterizers to Accelerate 3D Gaussian Splatting0
ML-Triton, A Multi-Level Compilation and Language Extension to Triton GPU Programming0
Reducing Communication Overhead in Federated Learning for Network Anomaly Detection with Adaptive Client Selection0
TGBFormer: Transformer-GraphFormer Blender Network for Video Object Detection0
Bolt3D: Generating 3D Scenes in Seconds0
Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation0
ClusComp: A Simple Paradigm for Model Compression and Efficient Finetuning0
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory0
MagicDistillation: Weak-to-Strong Video Distillation for Large-Scale Few-Step Synthesis0
AccelGen: Heterogeneous SLO-Guaranteed High-Throughput LLM Inference Serving for Diverse Applications0
Changing Base Without Losing Pace: A GPU-Efficient Alternative to MatMul in DNNs0
PIPO: Pipelined Offloading for Efficient Inference on Consumer Devices0
Characterizing GPU Resilience and Impact on AI/HPC Systems0
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers0
Distance-Based Tree-Sliced Wasserstein DistanceCode0
X-EcoMLA: Upcycling Pre-Trained Attention into MLA for Efficient and Extreme KV Compression0
LLMPerf: GPU Performance Modeling meets Large Language ModelsCode0
Cost-effective Deep Learning Infrastructure with NVIDIA GPUCode0
OuroMamba: A Data-Free Quantization Framework for Vision Mamba Models0
KV-Distill: Nearly Lossless Learnable Context Compression for LLMs0
Speedy MASt3R0
MoE-Gen: High-Throughput MoE Inference on a Single GPU with Module-Based BatchingCode0
Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference0
Sometimes Painful but Certainly Promising: Feasibility and Trade-offs of Language Model Inference at the Edge0
VideoScan: Enabling Efficient Streaming Video Understanding via Frame-level Semantic Carriers0
MarineGym: A High-Performance Reinforcement Learning Platform for Underwater Robotics0
Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM InferenceCode0
Accelerating MoE Model Inference with Expert Sharding0
TT-GaussOcc: Test-Time Compute for Self-Supervised Occupancy Prediction via Spatio-Temporal Gaussian Splatting0
AdaptSR: Low-Rank Adaptation for Efficient and Scalable Real-World Super-Resolution0
Global Context Is All You Need for Parallel Efficient Tractography Parcellation0
Show:102550
← PrevPage 40 of 113Next →

No leaderboard results yet.