SOTAVerified

GPU

Papers

Showing 20512100 of 5629 papers

TitleStatusHype
FIKIT: Priority-Based Real-time GPU Multi-tasking Scheduling with Kernel Identification0
EvaSurf: Efficient View-Aware Implicit Textured Surface Reconstruction0
JaxMARL: Multi-Agent RL Environments and Algorithms in JAXCode2
Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation0
DMV3D: Denoising Multi-View Diffusion using 3D Large Reconstruction Model0
4K-Resolution Photo Exposure Correction at 125 FPS with ~8K ParametersCode1
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers FasterCode2
A GPU-Accelerated Moving-Horizon Algorithm for Training Deep Classification Trees on Large Datasets0
InfMLLM: A Unified Framework for Visual-Language TasksCode1
PerceptionGPT: Effectively Fusing Visual Perception into LLM0
LCM-LoRA: A Universal Stable-Diffusion Acceleration ModuleCode4
GPU-Accelerated WFST Beam Search Decoder for CTC-based Speech RecognitionCode1
Evaluating Emerging AI/ML Accelerators: IPU, RDU, and NVIDIA/AMD GPUs0
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language ModelsCode5
A Comprehensive Summarization and Evaluation of Feature Refinement Modules for CTR PredictionCode0
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining0
Input Reconstruction Attack against Vertical Federated Large Language Models0
Estimator-Coupled Reinforcement Learning for Robust Purely Tactile In-Hand Manipulation0
Prompt Cache: Modular Attention Reuse for Low-Latency InferenceCode1
Black-Box Prompt Optimization: Aligning Large Language Models without Model TrainingCode2
Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models0
Distributed Matrix-Based Sampling for Graph Neural Network Training0
Weight-Sharing RegularizationCode0
S-LoRA: Serving Thousands of Concurrent LoRA AdaptersCode3
VR-NeRF: High-Fidelity Virtualized Walkable SpacesCode1
Ultra-Long Sequence Distributed Transformer0
Augmentation is AUtO-Net: Augmentation-Driven Contrastive Multiview Learning for Medical Image Segmentation0
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU0
Zero Coordinate Shift: Whetted Automatic Differentiation for Physics-informed Operator LearningCode0
A Two-Step Framework for Multi-Material Decomposition of Dual Energy Computed Tomography from Projection Domain0
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B0
In Search of Lost Online Test-time Adaptation: A SurveyCode1
StairNet: Visual Recognition of Stairs for Human-Robot Locomotion0
Network Contention-Aware Cluster Scheduling with Reinforcement LearningCode1
DiffusionVID: Denoising Object Boxes with Spatio-temporal Conditioning for Video Object DetectionCode1
Learning to love diligent trolls: Accounting for rater effects in the dialogue safety taskCode0
FetusMapV2: Enhanced Fetal Pose Estimation in 3D Ultrasound0
Prediction of Effective Elastic Moduli of Rocks using Graph Neural NetworksCode1
PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices0
Bespoke Solvers for Generative Flow Models0
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsCode1
Atom: Low-bit Quantization for Efficient and Accurate LLM ServingCode2
The Synergy of Speculative Decoding and Batching in Serving Large Language Models0
Punica: Multi-Tenant LoRA ServingCode3
OpenDMC: An Open-Source Library and Performance Evaluation for Deep-learning-based Multi-frame CompressionCode0
FP8-LM: Training FP8 Large Language ModelsCode2
LLMSTEP: LLM proofstep suggestions in LeanCode1
Real-Time Neural Materials using Block-Compressed Features0
PockEngine: Sparse and Efficient Fine-tuning in a Pocket0
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUsCode3
Show:102550
← PrevPage 42 of 113Next →

No leaderboard results yet.