SOTAVerified

GPU

Papers

Showing 2650 of 5629 papers

TitleStatusHype
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model ServingCode7
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive ParallelismCode7
ThunderKittens: Simple, Fast, and Adorable AI KernelsCode7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution RefinementCode7
Pyramidal Flow Matching for Efficient Video Generative ModelingCode7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AICode7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
Scalable MatMul-free Language ModelingCode7
Mirage: A Multi-Level Superoptimizer for Tensor ProgramsCode7
Labeling supervised fine-tuning data with the scaling lawCode7
Fast Timing-Conditioned Latent Audio DiffusionCode7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency ModelsCode7
Elixir: Train a Large Language Model on a Small GPU ClusterCode7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained TransformersCode7
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectorsCode7
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsCode6
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningCode6
SqueezeLLM: Dense-and-Sparse QuantizationCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Representing Long Volumetric Video with Temporal Gaussian HierarchyCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
Show:102550
← PrevPage 2 of 226Next →

No leaderboard results yet.