SOTAVerified

GPU

Papers

Showing 601650 of 5629 papers

TitleStatusHype
Brain-inspired sparse training enables Transformers and LLMs to perform as fully connected0
Rethinking Diffusion Posterior Sampling: From Conditional Score Estimator to Maximizing a PosteriorCode1
LLM-based Affective Text Generation Quality Based on Different Quantization Values0
Scaling Policy Gradient Quality-Diversity with Massive Parallelization via Behavioral Variations0
adabmDCA 2.0 -- a flexible but easy-to-use package for Direct Coupling AnalysisCode0
CrowdSplat: Exploring Gaussian Splatting For Crowd RenderingCode0
Assessing the Capability of YOLO- and Transformer-based Object Detectors for Real-time Weed Detection0
One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning0
Return of the Encoder: Maximizing Parameter Efficiency for SLMsCode1
Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference0
PISCO: Pretty Simple Compression for Retrieval-Augmented Generation0
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement LearningCode7
Towards Scalable Topological Regularizers0
An Efficient Sparse Kernel Generator for O(3)-Equivariant Deep NetworksCode2
3DGS^2: Near Second-order Converging 3D Gaussian Splatting0
HEPPO: Hardware-Efficient Proximal Policy Optimization -- A Universal Pipelined Architecture for Generalized Advantage Estimation0
Irrational Complex Rotations Empower Low-bit Optimizers0
Learning Versatile Optimizers on a Compute DietCode0
GANQ: GPU-Adaptive Non-Uniform Quantization for Large Language ModelsCode0
Pushing the Limits of BFP on Narrow Precision LLM Inference0
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
TOFFE -- Temporally-binned Object Flow from Events for High-speed and Energy-Efficient Object Detection and Tracking0
Recurrent Diffusion for Large-Scale Parameter GenerationCode2
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?Code3
MOFA: Discovering Materials for Carbon Capture with a GenAI- and Simulation-Based Workflow0
Revisiting Ensemble Methods for Stock Trading and Crypto Trading Tasks at ACM ICAIF FinRL Contest 2023-20240
No More Sliding Window: Efficient 3D Medical Image Segmentation with Differentiable Top-k Patch Sampling0
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models0
Good things come in small packages: Should we build AI clusters with Lite-GPUs?0
PixelBrax: Learning Continuous Control from Pixels End-to-End on the GPUCode0
The Streaming Batch Model for Efficient and Fault-Tolerant Heterogeneous Execution0
FASP: Fast and Accurate Structured Pruning of Large Language Models0
Resource-Constrained Federated Continual Learning: What Does Matter?0
GS-LIVO: Real-Time LiDAR, Inertial, and Visual Multi-sensor Fused Odometry with Gaussian Mapping0
Towards Lightweight and Stable Zero-shot TTS with Self-distilled Representation Disentanglement0
Towards Lightweight Time Series Forecasting: a Patch-wise Transformer with Weak Data Enriching0
Keras Sig: Efficient Path Signature Computation on GPU in Keras 30
Physics-Informed Latent Neural Operator for Real-time Predictions of Complex Physical Systems0
CuAsmRL: Optimizing GPU SASS Schedules via Deep Reinforcement LearningCode1
Hierarchical Autoscaling for Large Language Model Serving with Chiron0
A User's Guide to KSig: GPU-Accelerated Computation of the Signature KernelCode2
Generalized and Efficient 2D Gaussian Splatting for Arbitrary-scale Super-ResolutionCode2
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlappingCode1
Ultra Memory-Efficient On-FPGA Training of Transformers via Tensor-Compressed Optimization0
Towards Early Prediction of Self-Supervised Speech Model Performance0
TakuNet: an Energy-Efficient CNN for Real-Time Inference on Embedded UAV systems in Emergency Response ScenariosCode2
MS-Temba : Multi-Scale Temporal Mamba for Efficient Temporal Action DetectionCode1
Benchmarking Rotary Position Embeddings for Automatic Speech Recognition0
EXION: Exploiting Inter- and Intra-Iteration Output Sparsity for Diffusion Models0
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters0
Show:102550
← PrevPage 13 of 113Next →

No leaderboard results yet.