SOTAVerified

GPU

Papers

Showing 26012650 of 5629 papers

TitleStatusHype
High-Throughput SAT SamplingCode0
High-Resolution Deep Convolutional Generative Adversarial NetworksCode0
Fast ES-RNN: A GPU Implementation of the ES-RNN AlgorithmCode0
Parallel and in-process compilation of individuals for genetic programming on GPUCode0
Automatic Differentiation in PyTorchCode0
Parallel Hyperparameter Optimization Of Spiking Neural NetworkCode0
ILP-M Conv: Optimize Convolution Algorithm for Single-Image Convolution Neural Network Inference on Mobile GPUsCode0
Comparing Energy Efficiency of CPU, GPU and FPGA Implementations for Vision KernelsCode0
High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation ComparisonCode0
Higher-Order Ratio Cycles for Fast and Globally Optimal Shape MatchingCode0
HighEr-Resolution Network for Image Demosaicing and EnhancingCode0
High-quality Task Division for Large-scale Entity AlignmentCode0
Improving the Neural GPU Architecture for Algorithm LearningCode0
Posterior-Guided Neural Architecture SearchCode0
Faster object tracking pipeline for real time tracking0
Comparative Analysis of Open Source Frameworks for Machine Learning with Use Case in Single-Threaded and Multi-Threaded Modes0
Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning0
Architecture Search of Dynamic Cells for Semantic Video Segmentation0
Faster Inference of Integer SWIN Transformer by Removing the GELU Activation0
BNAS-v2: Memory-efficient and Performance-collapse-prevented Broad Neural Architecture Search0
Comparative Analysis of CPU and GPU Profiling for Deep Learning Models0
Faster and Smarter AutoAugment: Augmentation Policy Search Based on Dynamic Data-Clustering0
Compact Neural Network Solutions to Laplace's Equation in a Nanofluidic Device0
Architectural Implications of Embedding Dimension during GCN on CPU and GPU0
Fast Distributed Inference Serving for Large Language Models0
Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments0
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective0
CompAct: Compressed Activations for Memory-Efficient LLM Training0
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech0
Addax: Utilizing Zeroth-Order Gradients to Improve Memory Efficiency and Performance of SGD for Fine-Tuning Language Models0
Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization0
Fast-COS: A Fast One-Stage Object Detector Based on Reparameterized Attention Vision Transformer for Autonomous Driving0
FastCHGNet: Training one Universal Interatomic Potential to 1.5 Hours with 32 GPUs0
Communication Optimization for Distributed Training: Architecture, Advances, and Opportunities0
Communication-Free Distributed GNN Training with Vertex Cut0
Fast Back-Projection for Non-Line of Sight Reconstruction0
Communication-Efficient TeraByte-Scale Model Training Framework for Online Advertising0
ARAP-GS: Drag-driven As-Rigid-As-Possible 3D Gaussian Splatting Editing with Diffusion Prior0
FastAttention: Extend FlashAttention2 to NPUs and Low-resource GPUs0
Fast and Scalable Optimal Transport for Brain Tractograms0
A Random Gossip BMUF Process for Neural Language Modeling0
Fast and Scalable Distributed Deep Convolutional Autoencoder for fMRI Big Data Analytics0
Fast and Robust Hand Tracking Using Detection-Guided Optimization0
Fast and parallel decoding for transducer0
Communication Contention Aware Scheduling of Multiple Deep Learning Training Jobs0
A Data-Driven Approach to Dataflow-Aware Online Scheduling for Graph Neural Network Inference0
Accelerating Flow-Matching-Based Text-to-Speech via Empirically Pruned Step Sampling0
3D helical CT Reconstruction with a Memory Efficient Learned Primal-Dual Architecture0
Fast and Efficient Once-For-All Networks for Diverse Hardware Deployment0
ComboNet: Combined 2D & 3D Architecture for Aorta Segmentation0
Show:102550
← PrevPage 53 of 113Next →

No leaderboard results yet.