SOTAVerified

GPU

Papers

Showing 51100 of 5629 papers

TitleStatusHype
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language ModelsCode5
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio GenerationCode5
KBLaM: Knowledge Base augmented Language ModelCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
Fast On-device LLM Inference with NPUsCode5
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel FusionCode5
AudioLCM: Text-to-Audio Generation with Latent Consistency ModelsCode5
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient FinetuningCode5
Extreme Compression of Large Language Models via Additive QuantizationCode5
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPUCode5
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language ModelsCode5
ReLoRA: High-Rank Training Through Low-Rank UpdatesCode5
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPUCode5
EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network DesignCode5
YOLOv6 v3.0: A Full-Scale ReloadingCode5
Orbit: A Unified Simulation Framework for Interactive Robot Learning EnvironmentsCode5
Point-E: A System for Generating 3D Point Clouds from Complex PromptsCode5
Deep Lake: a Lakehouse for Deep LearningCode5
YOLOv6: A Single-Stage Object Detection Framework for Industrial ApplicationsCode5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at ScaleCode5
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a SecondCode5
Multi-head Temporal Latent AttentionCode4
Accelerating Visual-Policy Learning through Parallel Differentiable SimulationCode4
OnPrem.LLM: A Privacy-Conscious Document Intelligence ToolkitCode4
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory ConstraintsCode4
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length FloatCode4
LettuceDetect: A Hallucination Detection Framework for RAG ApplicationsCode4
Building reliable sim driving agents by scaling self-playCode4
KernelBench: Can LLMs Write Efficient GPU Kernels?Code4
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenCode4
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference OptimizationCode4
SocialED: A Python Library for Social Event DetectionCode4
SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion ModelsCode4
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming HeadsCode4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation ExpertsCode4
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video UnderstandingCode4
EmbodiedSAM: Online Segment Any 3D Thing in Real TimeCode4
Deep Patch Visual SLAMCode4
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPSCode4
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model InternalsCode4
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial IntelligenceCode4
On Scaling Up 3D Gaussian Splatting TrainingCode4
Mamba YOLO: A Simple Baseline for Object Detection with State Space ModelCode4
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image GenerationCode4
Looking Backward: Streaming Video-to-Video Translation with Feature BanksCode4
Vidur: A Large-Scale Simulation Framework For LLM InferenceCode4
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM ServingCode4
Mamba-FETrack: Frame-Event Tracking via State Space ModelCode4
JetMoE: Reaching Llama2 Performance with 0.1M DollarsCode4
Show:102550
← PrevPage 2 of 113Next →

No leaderboard results yet.