SOTAVerified

GPU

Papers

Showing 201225 of 5629 papers

TitleStatusHype
Flash3D: Feed-Forward Generalisable 3D Scene Reconstruction from a Single ImageCode3
MotionFollower: Editing Video Motion via Lightweight Score-Guided DiffusionCode3
Transformers Can Do Arithmetic with the Right EmbeddingsCode3
Various Lengths, Constant Speed: Efficient Language Modeling with Lightning AttentionCode3
vHeat: Building Vision Models upon Heat ConductionCode3
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPUCode3
vAttention: Dynamic Memory Management for Serving LLMs without PagedAttentionCode3
Andes: Defining and Enhancing Quality-of-Experience in LLM-Based Text Streaming ServicesCode3
MixLoRA: Enhancing Large Language Models Fine-Tuning with LoRA-based Mixture of ExpertsCode3
SnapKV: LLM Knows What You are Looking for Before GenerationCode3
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative DecodingCode3
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video UnderstandingCode3
Allo: A Programming Model for Composable Accelerator DesignCode3
BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language ModelsCode3
Tensorized NeuroEvolution of Augmenting Topologies for GPU AccelerationCode3
GPU-accelerated Evolutionary Multiobjective Optimization Using Tensorized RVEACode3
94% on CIFAR-10 in 3.29 Seconds on a Single GPUCode3
The Unreasonable Ineffectiveness of the Deeper LayersCode3
GaussianImage: 1000 FPS Image Representation and Compression by 2D Gaussian SplattingCode3
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-ServeCode3
Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement LearningCode3
TorchCP: A Python Library for Conformal PredictionCode3
BitDelta: Your Fine-Tune May Only Be Worth One BitCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
BiLLM: Pushing the Limit of Post-Training Quantization for LLMsCode3
Show:102550
← PrevPage 9 of 226Next →

No leaderboard results yet.