SOTAVerified

Scheduling

Project or Job Scheduling

Papers

Showing 150 of 3104 papers

TitleStatusHype
FlashInfer: Efficient and Customizable Attention Engine for LLM Inference ServingCode9
PowerInfer-2: Fast Large Language Model Inference on a SmartphoneCode9
Steering Language Models with Game-Theoretic SolversCode9
FastSwitch: Optimizing Context Switching Efficiency in Fairness-aware Large Language Model ServingCode7
The Road Less ScheduledCode7
Colossal-Auto: Unified Automation of Parallelization and Activation Checkpoint for Large-scale ModelsCode7
AssetOpsBench: Benchmarking AI Agents for Task Automation in Industrial Asset Operations and MaintenanceCode5
FlowTok: Flowing Seamlessly Across Text and Image TokensCode5
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language ModelsCode5
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory ConstraintsCode4
One Step Diffusion via Shortcut ModelsCode4
PixelsDB: Serverless and NL-Aided Data Analytics with Flexible Service Levels and PricesCode4
Vidur: A Large-Scale Simulation Framework For LLM InferenceCode4
ServerlessLLM: Low-Latency Serverless Inference for Large Language ModelsCode4
Orion-14B: Open-source Multilingual Large Language ModelsCode4
FedML Parrot: A Scalable Federated Learning System via Heterogeneity-aware Scheduling on Sequential and Hierarchical TrainingCode4
Vine Copulas as Differentiable Computational GraphsCode3
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
Efficiently Serving LLM Reasoning Programs with CertaindexCode3
A Survey on Large Language Model Acceleration based on KV Cache ManagementCode3
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1Code3
LayerKV: Optimizing Large Language Model Serving with Layer-wise KV Cache ManagementCode3
FlashGS: Efficient 3D Gaussian Splatting for Large-scale and High-resolution RenderingCode3
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-ServeCode3
Fairness in Serving Large Language ModelsCode3
Taskflow: A Lightweight Parallel and Heterogeneous Task Graph Computing SystemCode3
MNN: A Universal and Efficient Inference EngineCode3
SystolicAttention: Fusing FlashAttention within a Single Systolic ArrayCode2
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics LearningCode2
RapFlow-TTS: Rapid and High-Fidelity Text-to-Speech with Improved Consistency Flow MatchingCode2
Solving the Job Shop Scheduling Problem with Graph Neural Networks: A Customizable Reinforcement Learning EnvironmentCode2
ALE-Bench: A Benchmark for Long-Horizon Objective-Driven Algorithm EngineeringCode2
Demystifying and Enhancing the Efficiency of Large Language Model Based Search AgentsCode2
Piloting Structure-Based Drug Design via Modality-Specific Optimal ScheduleCode2
HybriMoE: Hybrid CPU-GPU Scheduling and Cache Management for Efficient MoE InferenceCode2
AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real WorldCode2
AttentionEngine: A Versatile Framework for Efficient Attention Mechanisms on Diverse Hardware PlatformsCode2
Streaming Video Understanding and Multi-round Interaction with Memory-enhanced KnowledgeCode2
RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-WorldCode2
Efficient LLM Scheduling by Learning to RankCode2
MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further TuningCode2
SustainDC: Benchmarking for Sustainable Data Center ControlCode2
RodinHD: High-Fidelity 3D Avatar Generation with Diffusion ModelsCode2
Teola: Towards End-to-End Optimization of LLM-based ApplicationsCode2
Chat AI: A Seamless Slurm-Native Solution for HPC-Based ServicesCode2
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and VotingCode2
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video EditingCode2
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-FlowCode2
Self-Consistent Recursive Diffusion Bridge for Medical Image TranslationCode2
Show:102550
← PrevPage 1 of 63Next →

No leaderboard results yet.