SOTAVerified

GPU

Papers

Showing 381390 of 5629 papers

TitleStatusHype
I-BERT: Integer-only BERT QuantizationCode2
Accelerating Transformer Pre-training with 2:4 SparsityCode2
HeadInfer: Memory-Efficient LLM Inference by Head-wise OffloadingCode2
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM InferenceCode2
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-FlowCode2
Characterization of Large Language Model Development in the DatacenterCode2
H_2O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language ModelsCode2
Habitat 2.0: Training Home Assistants to Rearrange their HabitatCode2
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion TransformersCode2
Collaborative Decoding Makes Visual Auto-Regressive Modeling EfficientCode2
Show:102550
← PrevPage 39 of 563Next →

No leaderboard results yet.