SOTAVerified

GPU

Papers

Showing 301350 of 5629 papers

TitleStatusHype
Momentum-GS: Momentum Gaussian Self-Distillation for High-Quality Large Scene ReconstructionCode2
Low-resource finetuning of foundation models beats state-of-the-art in histopathologyCode2
Low-Rank Quantization-Aware Training for LLMsCode2
Forecasting GPU Performance for Deep Learning Training and InferenceCode2
Accelerated Quality-Diversity through Massive ParallelismCode2
LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor SearchCode2
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence ParallelismCode2
LoQT: Low-Rank Adapters for Quantized PretrainingCode2
cuSLINK: Single-linkage Agglomerative Clustering on the GPUCode2
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language ModelsCode2
LoRA: Low-Rank Adaptation of Large Language ModelsCode2
Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote SensingCode2
Cross-domain Neural Pitch and Periodicity EstimationCode2
CrypTen: Secure Multi-Party Computation Meets Machine LearningCode2
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language ModelsCode2
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference OptimizationCode2
360MonoDepth: High-Resolution 360deg Monocular Depth EstimationCode2
LightSeq2: Accelerated Training for Transformer-based Models on GPUsCode2
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMsCode2
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS LibraryCode2
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent SpaceCode2
AutoTriton: Automatic Triton Programming with Reinforcement Learning in LLMsCode2
Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement LearningCode2
LightSeq: A High Performance Inference Library for TransformersCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
LAMP: Learn A Motion Pattern for Few-Shot-Based Video GenerationCode2
Latent Neural Operator for Solving Forward and Inverse PDE ProblemsCode2
LeanDojo: Theorem Proving with Retrieval-Augmented Language ModelsCode2
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio GenerationCode2
2nd Place Solution for Waymo Open Dataset Challenge -- Real-time 2D Object DetectionCode2
Learning to Fly in SecondsCode2
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs TrainingCode2
A User's Guide to KSig: GPU-Accelerated Computation of the Signature KernelCode2
Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation ModelsCode2
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot LearningCode2
2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D Object DetectionCode2
MODNet: Real-Time Trimap-Free Portrait Matting via Objective DecompositionCode2
Instant Volumetric Head AvatarsCode2
INT-FlashAttention: Enabling Flash Attention for INT8 QuantizationCode2
InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information RetrievalCode2
Invertible Diffusion Models for Compressed SensingCode2
JaxMARL: Multi-Agent RL Environments and Algorithms in JAXCode2
AudioDec: An Open-source Streaming High-fidelity Neural Audio CodecCode2
Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic ScenesCode2
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceCode2
I-BERT: Integer-only BERT QuantizationCode2
ImMesh: An Immediate LiDAR Localization and Meshing FrameworkCode2
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and DetectionCode2
AutoFocus: Efficient Multi-Scale InferenceCode2
HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic SegmentationCode2
Show:102550
← PrevPage 7 of 113Next →

No leaderboard results yet.