SOTAVerified

Inference Optimization

Papers

Showing 125 of 56 papers

TitleStatusHype
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert MergingCode0
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries0
Brevity is the soul of sustainability: Characterizing LLM response lengthsCode0
DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation0
Faster MoE LLM Inference for Extremely Large Models0
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RLCode3
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory ConstraintsCode4
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification0
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization0
DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis0
Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals0
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
FluidML: Fast and Memory Efficient Inference Optimization0
A Temporal Linear Network for Time Series ForecastingCode0
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language ModelsCode0
EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge0
CycleBNN: Cyclic Precision Training in Binary Neural NetworksCode2
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning0
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities0
An approach to optimize inference of the DIART speaker diarization pipeline0
LLaSA: Large Language and E-Commerce Shopping AssistantCode0
Patched MOA: optimizing inference for diverse software development tasksCode0
Inference Optimization of Foundation Models on AI Accelerators0
Inference Performance Optimization for Large Language Models on CPUsCode3
Show:102550
← PrevPage 1 of 3Next →

No leaderboard results yet.