SOTAVerified|Agents Browse Leaderboard About Blog

GPU

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 5629 papers

Title	Date	Tasks	Status	Hype
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving	Nov 27, 2024	FairnessGPU	CodeCode Available	7
xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism	Nov 4, 2024	GPU	CodeCode Available	7
ThunderKittens: Simple, Fast, and Adorable AI Kernels	Oct 27, 2024	GPUState Space Models	CodeCode Available	7
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement	Oct 17, 2024	GPUReal-Time Object Detection	CodeCode Available	7
Pyramidal Flow Matching for Efficient Video Generative Modeling	Oct 8, 2024	GPUText-to-Video Generation	CodeCode Available	7
ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI	Oct 1, 2024	GPUImitation Learning	CodeCode Available	7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving	Jun 24, 2024	CPUGPU	CodeCode Available	7
Scalable MatMul-free Language Modeling	Jun 4, 2024	GPULanguage Modeling	CodeCode Available	7
Mirage: A Multi-Level Superoptimizer for Tensor Programs	May 9, 2024	GPUNavigate	CodeCode Available	7
Labeling supervised fine-tuning data with the scaling law	May 5, 2024	coreference-resolutionCoreference Resolution	CodeCode Available	7
Fast Timing-Conditioned Latent Audio Diffusion	Feb 7, 2024	Audio GenerationGPU	CodeCode Available	7
PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models	Jan 10, 2024	GPUImage Generation	CodeCode Available	7
Elixir: Train a Large Language Model on a Small GPU Cluster	Dec 10, 2022	CPUGPU	CodeCode Available	7
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers	Oct 31, 2022	GPULanguage Modelling	CodeCode Available	7
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors	Jul 6, 2022	2D Object DetectionGPU	CodeCode Available	7
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models	Sep 21, 2023	4kGPU	CodeCode Available	6
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning	Jul 17, 2023	GPULanguage Modeling	CodeCode Available	6
SqueezeLLM: Dense-and-Sparse Quantization	Jun 13, 2023	GPUQuantization	CodeCode Available	6
QLoRA: Efficient Finetuning of Quantized LLMs	May 23, 2023	ChatbotGPU	CodeCode Available	6
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness	May 27, 2022	16k4k	CodeCode Available	6
Group-in-Group Policy Optimization for LLM Agent Training	May 16, 2025	GPUMathematical Reasoning	CodeCode Available	5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention	Apr 22, 2025	GPU	CodeCode Available	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5
Representing Long Volumetric Video with Temporal Gaussian Hierarchy	Dec 12, 2024	GPU	CodeCode Available	5
DEIM: DETR with Improved Matching for Fast Convergence	Dec 5, 2024	Data AugmentationGPU	CodeCode Available	5

Show:10 25 50

← PrevPage 2 of 226Next →

No leaderboard results yet.