SOTAVerified

GPU

Papers

Showing 4150 of 5629 papers

TitleStatusHype
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsCode6
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningCode6
SqueezeLLM: Dense-and-Sparse QuantizationCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Representing Long Volumetric Video with Temporal Gaussian HierarchyCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
Show:102550
← PrevPage 5 of 563Next →

No leaderboard results yet.