SOTAVerified

GPU

Papers

Showing 2650 of 5629 papers

TitleStatusHype
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model ServingCode7
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AICode7
Scalable MatMul-free Language ModelingCode7
EvoRL: A GPU-accelerated Framework for Evolutionary Reinforcement LearningCode7
Revisiting PCA for time series reduction in temporal dimensionCode7
Elixir: Train a Large Language Model on a Small GPU ClusterCode7
Pyramidal Flow Matching for Efficient Video Generative ModelingCode7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersCode7
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image AnalysisCode7
Labeling supervised fine-tuning data with the scaling lawCode7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via TensorizationCode7
EvoGP: A GPU-accelerated Framework for Tree-based Genetic ProgrammingCode7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsCode6
SqueezeLLM: Dense-and-Sparse QuantizationCode6
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
Deep Lake: a Lakehouse for Deep LearningCode5
Show:102550
← PrevPage 2 of 226Next →

No leaderboard results yet.