SOTAVerified

GPU

Papers

Showing 5175 of 5629 papers

TitleStatusHype
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language ModelsCode5
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio GenerationCode5
KBLaM: Knowledge Base augmented Language ModelCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
Fast On-device LLM Inference with NPUsCode5
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel FusionCode5
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient FinetuningCode5
Extreme Compression of Large Language Models via Additive QuantizationCode5
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPUCode5
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language ModelsCode5
ReLoRA: High-Rank Training Through Low-Rank UpdatesCode5
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPUCode5
EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network DesignCode5
YOLOv6 v3.0: A Full-Scale ReloadingCode5
Orbit: A Unified Simulation Framework for Interactive Robot Learning EnvironmentsCode5
Point-E: A System for Generating 3D Point Clouds from Complex PromptsCode5
Deep Lake: a Lakehouse for Deep LearningCode5
YOLOv6: A Single-Stage Object Detection Framework for Industrial ApplicationsCode5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at ScaleCode5
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a SecondCode5
Multi-head Temporal Latent AttentionCode4
Accelerating Visual-Policy Learning through Parallel Differentiable SimulationCode4
OnPrem.LLM: A Privacy-Conscious Document Intelligence ToolkitCode4
Show:102550
← PrevPage 3 of 226Next →

No leaderboard results yet.