SOTAVerified

CPU

Papers

Showing 150 of 2231 papers

TitleStatusHype
WebLLM: A High-Performance In-Browser LLM Inference EngineCode11
Magika: AI-Powered Content-Type DetectionCode11
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data ConstructionCode9
Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation ModelsCode9
PowerInfer-2: Fast Large Language Model Inference on a SmartphoneCode9
Chinese-Vicuna: A Chinese Instruction-following Llama-based ModelCode7
Bridging Evolutionary Multiobjective Optimization and GPU Acceleration via TensorizationCode7
Mooncake: A KVCache-centric Disaggregated Architecture for LLM ServingCode7
Full Scaling Automation for Sustainable Development of Green Data CentersCode7
Elixir: Train a Large Language Model on a Small GPU ClusterCode7
Fast On-device LLM Inference with NPUsCode5
XFeat: Accelerated Features for Lightweight Image MatchingCode5
Extreme Compression of Large Language Models via Additive QuantizationCode5
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPUCode5
Faster Segment Anything: Towards Lightweight SAM for Mobile ApplicationsCode5
FlexGen: High-Throughput Generative Inference of Large Language Models with a Single GPUCode5
Vectorized and performance-portable QuicksortCode5
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length FloatCode4
SocialED: A Python Library for Social Event DetectionCode4
InternLM2.5-StepProver: Advancing Automated Theorem Proving via Expert Iteration on Large-Scale LEAN ProblemsCode4
Data-Prep-Kit: getting your data ready for LLM application developmentCode4
SigmaRL: A Sample-Efficient and Generalizable Multi-Agent Reinforcement Learning Framework for Motion PlanningCode4
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on EdgeCode4
Look Once to Hear: Target Speech Hearing with Noisy ExamplesCode4
Vidur: A Large-Scale Simulation Framework For LLM InferenceCode4
Couler: Unified Machine Learning Workflow Optimization in CloudCode4
Tiny Time Mixers (TTMs): Fast Pre-trained Models for Enhanced Zero/Few-Shot Forecasting of Multivariate Time SeriesCode4
FFCV: Accelerating Training by Removing Data BottlenecksCode4
DeepFilterNet: Perceptually Motivated Real-Time Speech EnhancementCode4
DAMO-YOLO : A Report on Real-Time Object Detection DesignCode4
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented ScaleCode4
EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense PredictionCode4
PLAID: An Efficient Engine for Late Interaction RetrievalCode4
DeepFilterNet2: Towards Real-Time Speech Enhancement on Embedded Devices for Full-Band AudioCode4
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language ModelsCode4
GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree EnsemblesCode4
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
GPU-accelerated Evolutionary Many-objective Optimization Using Tensorized NSGA-IIICode3
ZO2: Scalable Zeroth-Order Fine-Tuning for Extremely Large Language Models with Limited GPU MemoryCode3
Nd-BiMamba2: A Unified Bidirectional Architecture for Multi-Dimensional Data ProcessingCode3
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM InferenceCode3
MagicPIG: LSH Sampling for Efficient LLM GenerationCode3
vTensor: Flexible Virtual Tensor Management for Efficient LLM ServingCode3
Inference Performance Optimization for Large Language Models on CPUsCode3
NGD-SLAM: Towards Real-Time Dynamic SLAM without GPUCode3
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts ModelsCode3
MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile DevicesCode3
XuanCe: A Comprehensive and Unified Deep Reinforcement Learning LibraryCode3
Take the aTrain. Introducing an Interface for the Accessible Transcription of InterviewsCode3
Unlimiformer: Long-Range Transformers with Unlimited Length InputCode3
Show:102550
← PrevPage 1 of 45Next →

No leaderboard results yet.