SOTAVerified

CPU

Papers

Showing 125 of 2231 papers

TitleStatusHype
WebLLM: A High-Performance In-Browser LLM Inference EngineCode11
Magika: AI-Powered Content-Type DetectionCode11
PowerInfer-2: Fast Large Language Model Inference on a SmartphoneCode9
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data ConstructionCode9
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation ModelsCode9
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via TensorizationCode7
Full Scaling Automation for Sustainable Development of Green Data CentersCode7
Elixir: Train a Large Language Model on a Small GPU ClusterCode7
Chinese-Vicuna: A Chinese Instruction-following Llama-based ModelCode7
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPUCode5
XFeat: Accelerated Features for Lightweight Image MatchingCode5
Vectorized and performance-portable QuicksortCode5
Fast On-device LLM Inference with NPUsCode5
Extreme Compression of Large Language Models via Additive QuantizationCode5
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPUCode5
GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree EnsemblesCode4
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented ScaleCode4
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band AudioCode4
DeepFilterNet: Perceptually Motivated Real-Time Speech EnhancementCode4
PLAID: An Efficient Engine for Late Interaction RetrievalCode4
DAMO-YOLO : A Report on Real-Time Object Detection DesignCode4
FFCV: Accelerating Training by Removing Data BottlenecksCode4
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length FloatCode4
Show:102550
← PrevPage 1 of 90Next →

No leaderboard results yet.