SOTAVerified

CPU

Papers

Showing 2650 of 2231 papers

TitleStatusHype
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length FloatCode4
DeepFilterNet: Perceptually Motivated Real-Time Speech EnhancementCode4
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented ScaleCode4
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsCode4
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on EdgeCode4
Look Once to Hear: Target Speech Hearing with Noisy ExamplesCode4
DAMO-YOLO : A Report on Real-Time Object Detection DesignCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Vidur: A Large-Scale Simulation Framework For LLM InferenceCode4
Couler: Unified Machine Learning Workflow Optimization in CloudCode4
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band AudioCode4
GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-IIICode3
Take the aTrain. Introducing an Interface for the Accessible Transcription of InterviewsCode3
Unlimiformer: Long-Range Transformers with Unlimited Length InputCode3
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden IntermediatesCode3
SoundStream: An End-to-End Neural Audio CodecCode3
Performance Analysis of Open Source Machine Learning Frameworks for Various Parameters in Single-Threaded and Multi-Threaded ModesCode3
Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data ProcessingCode3
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPUCode3
Observation-Centric SORT: Rethinking SORT for Robust Multi-Object TrackingCode3
MagicPIG: LSH Sampling for Efficient LLM GenerationCode3
A GPU-specialized Inference Parameter Server for Large-Scale Deep Recommendation ModelsCode3
Inference Performance Optimization for Large Language Models on CPUsCode3
Show:102550
← PrevPage 2 of 90Next →

No leaderboard results yet.