SOTAVerified

GPU

Papers

Showing 201250 of 5629 papers

TitleStatusHype
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
MotionFollower: Editing Video Motion via Lightweight Score-Guided DiffusionCode3
Transformers Can Do Arithmetic with the Right EmbeddingsCode3
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning AttentionCode3
vHeat: Building Vision Models upon Heat ConductionCode3
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPUCode3
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttentionCode3
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming ServicesCode3
SnapKV: LLM Knows What You are Looking for Before GenerationCode3
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of ExpertsCode3
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative DecodingCode3
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingCode3
Allo: A Programming Model for Composable Accelerator DesignCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Tensorized NeuroEvolution of Augmenting Topologies for GPU AccelerationCode3
GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEACode3
94% on CIFAR-10 in 3.29 Seconds on a Single GPUCode3
The Unreasonable Ineffectiveness of the Deeper LayersCode3
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian SplattingCode3
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-ServeCode3
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement LearningCode3
TorchCP: A Python Library for Conformal PredictionCode3
BitDelta: Your Fine-Tune May Only Be Worth One BitCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
EscherNet: A Generative Model for Scalable View SynthesisCode3
BiLLM: Pushing the Limit of Post-Training Quantization for LLMsCode3
Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State SpacesCode3
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache QuantizationCode3
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algorithm-System Co-DesignCode3
MoE-Infinity: Efficient MoE Inference on Personal Machines with Sparsity-Aware Expert CacheCode3
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language ModelsCode3
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust AdaptationCode3
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language ModelsCode3
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning LibraryCode3
Splatter Image: Ultra-Fast Single-View 3D ReconstructionCode3
S-LoRA: Serving Thousands of Concurrent LoRA AdaptersCode3
Punica: Multi-Tenant LoRA ServingCode3
TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUsCode3
Take the aTrain. Introducing an Interface for the Accessible Transcription of InterviewsCode3
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video GenerationCode3
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image GenerationCode3
nanoT5: A PyTorch Framework for Pre-training and Fine-tuning T5-style Models with Limited ResourcesCode3
Retentive Network: A Successor to Transformer for Large Language ModelsCode3
TAPIR: Tracking Any Point with per-frame Initialization and temporal RefinementCode3
Fine-Tuning Language Models with Just Forward PassesCode3
Unlimiformer: Long-Range Transformers with Unlimited Length InputCode3
TorchBench: Benchmarking PyTorch with High API Surface CoverageCode3
FastViT: A Fast Hybrid Vision Transformer using Structural ReparameterizationCode3
EvoTorch: Scalable Evolutionary Computation in PythonCode3
Show:102550
← PrevPage 5 of 113Next →

No leaderboard results yet.