SOTAVerified

GPU

Papers

Showing 301350 of 5629 papers

TitleStatusHype
LoRA: Low-Rank Adaptation of Large Language ModelsCode2
LoRANN: Low-Rank Matrix Factorization for Approximate Nearest Neighbor SearchCode2
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence ParallelismCode2
Accelerated Quality-Diversity through Massive ParallelismCode2
LoQT: Low-Rank Adapters for Quantized PretrainingCode2
Low-Rank Quantization-Aware Training for LLMsCode2
LongRecipe: Recipe for Efficient Long Context Generalization in Large Language ModelsCode2
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning ModelsCode2
Low-resource finetuning of foundation models beats state-of-the-art in histopathologyCode2
LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language ModelsCode2
DISTFLASHATTN: Distributed Memory-efficient Attention for Long-context LLMs TrainingCode2
LightSeq2: Accelerated Training for Transformer-based Models on GPUsCode2
Cross-domain Neural Pitch and Periodicity EstimationCode2
LightSeq: A High Performance Inference Library for TransformersCode2
λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent SpaceCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
360MonoDepth: High-Resolution 360deg Monocular Depth EstimationCode2
Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement LearningCode2
A Case Study in CUDA Kernel Fusion: Implementing FlashAttention-2 on NVIDIA Hopper Architecture using the CUTLASS LibraryCode2
LeanDojo: Theorem Proving with Retrieval-Augmented Language ModelsCode2
Latent Neural Operator for Solving Forward and Inverse PDE ProblemsCode2
Learning to Fly in SecondsCode2
LightGen: Efficient Image Generation through Knowledge Distillation and Direct Preference OptimizationCode2
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio GenerationCode2
JAX, M.D.: A Framework for Differentiable PhysicsCode2
JAX MD: A Framework for Differentiable PhysicsCode2
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency ModelCode2
CoMoSVC: Consistency Model-based Singing Voice ConversionCode2
2nd Place Solution for Waymo Open Dataset Challenge -- Real-time 2D Object DetectionCode2
Cross-Scale MAE: A Tale of Multi-Scale Exploitation in Remote SensingCode2
JaxMARL: Multi-Agent RL Environments and Algorithms in JAXCode2
LAMP: Learn A Motion Pattern for Few-Shot-Based Video GenerationCode2
Instant Volumetric Head AvatarsCode2
CoLLiE: Collaborative Training of Large Language Models in an Efficient WayCode2
2nd Place Solution for Waymo Open Dataset Challenge - Real-time 2D Object DetectionCode2
INT-FlashAttention: Enabling Flash Attention for INT8 QuantizationCode2
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
CoLA: Exploiting Compositional Structure for Automatic and Efficient Numerical Linear AlgebraCode2
ImMesh: An Immediate LiDAR Localization and Meshing FrameworkCode2
InPars Toolkit: A Unified and Reproducible Synthetic Data Generation Pipeline for Neural Information RetrievalCode2
Invertible Diffusion Models for Compressed SensingCode2
HybridDepth: Robust Metric Depth Fusion by Leveraging Depth from Focus and Single-Image PriorsCode2
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceCode2
HRDA: Context-Aware High-Resolution Domain-Adaptive Semantic SegmentationCode2
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and DetectionCode2
I-BERT: Integer-only BERT QuantizationCode2
HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level SynthesisCode2
Holistically-Attracted Wireframe Parsing: From Supervised to Self-Supervised LearningCode2
Im4D: High-Fidelity and Real-Time Novel View Synthesis for Dynamic ScenesCode2
Isaac Gym: High Performance GPU-Based Physics Simulation For Robot LearningCode2
Show:102550
← PrevPage 7 of 113Next →

No leaderboard results yet.