SOTAVerified

GPU

Papers

Showing 6170 of 5629 papers

TitleStatusHype
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio GenerationCode5
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient FinetuningCode5
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a SecondCode5
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language ModelsCode5
Deep Lake: a Lakehouse for Deep LearningCode5
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel FusionCode5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at ScaleCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Point-E: A System for Generating 3D Point Clouds from Complex PromptsCode5
Show:102550
← PrevPage 7 of 563Next →

No leaderboard results yet.