SOTAVerified

GPU

Papers

Showing 51100 of 5629 papers

TitleStatusHype
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse AttentionCode5
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient FinetuningCode5
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio GenerationCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
LongQLoRA: Efficient and Effective Method to Extend Context Length of Large Language ModelsCode5
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
YOLOv6: A Single-Stage Object Detection Framework for Industrial ApplicationsCode5
YOLOv6 v3.0: A Full-Scale ReloadingCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
Extreme Compression of Large Language Models via Additive QuantizationCode5
LLM.int8(): 8-bit Matrix Multiplication for Transformers at ScaleCode5
XGrammar: Flexible and Efficient Structured Generation Engine for Large Language ModelsCode5
ReLoRA: High-Rank Training Through Low-Rank UpdatesCode5
Group-in-Group Policy Optimization for LLM Agent TrainingCode5
Fast On-device LLM Inference with NPUsCode5
EfficientRep:An Efficient Repvgg-style ConvNets with Hardware-aware Neural Network DesignCode5
KBLaM: Knowledge Base augmented Language ModelCode5
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPUCode5
DEIM: DETR with Improved Matching for Fast ConvergenceCode5
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPUCode5
TabPFN: A Transformer That Solves Small Tabular Classification Problems in a SecondCode5
Representing Long Volumetric Video with Temporal Gaussian HierarchyCode5
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image SynthesisCode4
GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree EnsemblesCode4
Deep Patch Visual SLAMCode4
PIN-SLAM: LiDAR SLAM Using a Point-Based Implicit Neural Representation for Achieving Global Map ConsistencyCode4
Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted TreesCode4
GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPSCode4
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory ConstraintsCode4
On Scaling Up 3D Gaussian Splatting TrainingCode4
Otter: A Multi-Modal Model with In-Context Instruction TuningCode4
PLAID: An Efficient Engine for Late Interaction RetrievalCode4
Multi-head Temporal Latent AttentionCode4
Moûsai: Text-to-Music Generation with Long-Context Latent DiffusionCode4
CoTracker: It is Better to Track TogetherCode4
fVDB: A Deep-Learning Framework for Sparse, Large-Scale, and High-Performance Spatial IntelligenceCode4
NNsight and NDIF: Democratizing Access to Open-Weight Foundation Model InternalsCode4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation ExpertsCode4
Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image GenerationCode4
Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel TrainingCode4
OnPrem.LLM: A Privacy-Conscious Document Intelligence ToolkitCode4
Mamba-FETrack: Frame-Event Tracking via State Space ModelCode4
Mamba YOLO: A Simple Baseline for Object Detection with State Space ModelCode4
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical TrainingCode4
FFCV: Accelerating Training by Removing Data BottlenecksCode4
Looking Backward: Streaming Video-to-Video Translation with Feature BanksCode4
Building reliable sim driving agents by scaling self-playCode4
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length FloatCode4
fastai: A Layered API for Deep LearningCode4
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision TokenCode4
Show:102550
← PrevPage 2 of 113Next →

No leaderboard results yet.