SOTAVerified|Agents Browse Leaderboard About Blog

Scheduling

Project or Job Scheduling

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 3104 papers

Title	Date	Tasks	Status	Hype	Score
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving	Jan 2, 2025	GPUScheduling	CodeCode Available	9	5
Steering Language Models with Game-Theoretic Solvers	Jan 24, 2024	Imitation LearningScheduling	CodeCode Available	9	5
PowerInfer-2: Fast Large Language Model Inference on a Smartphone	Jun 10, 2024	CPULanguage Modeling	CodeCode Available	9	5
The Road Less Scheduled	May 24, 2024	Scheduling	CodeCode Available	7	5
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model Serving	Nov 27, 2024	FairnessGPU	CodeCode Available	7	5
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale Models	Feb 6, 2023	Scheduling	CodeCode Available	7	5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models	Aug 21, 2024	GPUQuantization	CodeCode Available	5	5
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and Maintenance	Jun 4, 2025	BenchmarkingScheduling	CodeCode Available	5	5
FlowTok: Flowing Seamlessly Across Text and Image Tokens	Mar 13, 2025	DenoisingImage to text	CodeCode Available	5	5
ServerlessLLM: Low-Latency Serverless Inference for Large Language Models	Jan 25, 2024	GPUScheduling	CodeCode Available	4	5
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical Training	Mar 3, 2023	Federated LearningGPU	CodeCode Available	4	5
One Step Diffusion via Shortcut Models	Oct 16, 2024	DenoisingScheduling	CodeCode Available	4	5
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory Constraints	Apr 15, 2025	GPUInference Optimization	CodeCode Available	4	5
Vidur: A Large-Scale Simulation Framework For LLM Inference	May 8, 2024	CPUGPU	CodeCode Available	4	5
Orion-14B: Open-source Multilingual Large Language Models	Jan 20, 2024	Scheduling	CodeCode Available	4	5
PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and Prices	May 30, 2024	Scheduling	CodeCode Available	4	5
Fairness in Serving Large Language Models	Dec 31, 2023	FairnessScheduling	CodeCode Available	3	5
Efficiently Serving LLM Reasoning Programs with Certaindex	Dec 30, 2024	Code GenerationMathematical Problem-Solving	CodeCode Available	3	5
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1	Oct 3, 2024	Scheduling	CodeCode Available	3	5
MNN: A Universal and Efficient Inference Engine	Feb 27, 2020	Deep LearningDiversity	CodeCode Available	3	5
Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing System	Apr 23, 2020	Scheduling	CodeCode Available	3	5
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache Management	Oct 1, 2024	GPULanguage Modeling	CodeCode Available	3	5
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution Rendering	Aug 15, 2024	Computational EfficiencyScheduling	CodeCode Available	3	5
A Survey on Large Language Model Acceleration based on KV Cache Management	Dec 27, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
A Survey on Inference Optimization Techniques for Mixture of Experts Models	Dec 18, 2024	Computational EfficiencyDistributed Computing	CodeCode Available	3	5

Show:10 25 50

← PrevPage 1 of 125Next →

No leaderboard results yet.