SOTAVerified

GPU

Papers

Showing 4150 of 5629 papers

TitleStatusHype
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-AwarenessCode6
LongLoRA: Efficient Fine-tuning of Long-Context Large Language ModelsCode6
SqueezeLLM: Dense-and-Sparse QuantizationCode6
FlashAttention-2: Faster Attention with Better Parallelism and Work PartitioningCode6
QLoRA: Efficient Finetuning of Quantized LLMsCode6
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
Deep Lake: a Lakehouse for Deep LearningCode5
Show:102550
← PrevPage 5 of 563Next →

No leaderboard results yet.