SOTAVerified

GPU

Papers

Showing 23262350 of 5629 papers

TitleStatusHype
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM Quantization0
Context Parallelism for Scalable Million-Token Inference0
Stochastic Communication Avoidance for Recommendation Systems0
NEO: Saving GPU Memory Crisis with CPU Offloading for Online LLM InferenceCode0
Hollowed Net for On-Device Personalization of Text-to-Image Diffusion Models0
CRONOS: Enhancing Deep Learning with Scalable GPU Accelerated Convex Neural Networks0
HopTrack: A Real-time Multi-Object Tracking System for Embedded DevicesCode0
Computation-Aware Gaussian Processes: Model Selection And Linear-Time Inference0
A Novel Breast Ultrasound Image Augmentation Method Using Advanced Neural Style Transfer: An Efficient and Explainable Approach0
Cycle-Constrained Adversarial Denoising Convolutional Network for PET Image Denoising: Multi-Dimensional Validation on Large Datasets with Reader Study and Real Low-Dose Data0
Reinforcement learning with learned gadgets to tackle hard quantum problems on real hardwareCode0
Context-Aware Token Selection and Packing for Enhanced Vision Transformer0
ProMoE: Fast MoE-based LLM Serving using Proactive Caching0
Application of Audio Fingerprinting Techniques for Real-Time Scalable Speech Retrieval and Speech Clusterization0
Memory-Efficient Point Cloud Registration via Overlapping Region Sampling0
A Message Passing Neural Network Surrogate Model for Bond-Associated Peridynamic Material Correspondence Formulation0
Revisiting Reliability in Large-Scale Machine Learning Research Clusters0
AI-assisted Agile Propagation Modeling for Real-time Digital Twin Wireless Networks0
Motion Graph Unleashed: A Novel Approach to Video PredictionCode0
Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUsCode0
VL-Cache: Sparsity and Modality-Aware KV Cache Compression for Vision-Language Model Inference Acceleration0
Accelerated Bayesian parameter estimation and model selection for gravitational waves with normalizing flows0
FusedInf: Efficient Swapping of DNN Models for On-Demand Serverless Inference Services on the EdgeCode0
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved OffloadingCode0
Computational Bottlenecks of Training Small-scale Large Language Models0
Show:102550
← PrevPage 94 of 226Next →

No leaderboard results yet.