SOTAVerified

Inference Optimization

Papers

Showing 150 of 56 papers

TitleStatusHype
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert MergingCode0
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries0
Brevity is the soul of sustainability: Characterizing LLM response lengthsCode0
DSMentor: Enhancing Data Science Agents with Curriculum Learning and Online Knowledge Accumulation0
Faster MoE LLM Inference for Extremely Large Models0
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RLCode3
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory ConstraintsCode4
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification0
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization0
DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis0
Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals0
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
FluidML: Fast and Memory Efficient Inference Optimization0
A Temporal Linear Network for Time Series ForecastingCode0
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language ModelsCode0
EdgeRL: Reinforcement Learning-driven Deep Learning Model Inference Optimization at Edge0
CycleBNN: Cyclic Precision Training in Binary Neural NetworksCode2
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning0
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities0
An approach to optimize inference of the DIART speaker diarization pipeline0
LLaSA: Large Language and E-Commerce Shopping AssistantCode0
Patched MOA: optimizing inference for diverse software development tasksCode0
Inference Optimization of Foundation Models on AI Accelerators0
Inference Performance Optimization for Large Language Models on CPUsCode3
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization0
Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval0
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks0
Advances and Open Challenges in Federated Foundation Models0
Federated Learning While Providing Model as a Service: Joint Training and Inference Optimization0
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition0
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions0
Bayesian Active Learning in the Presence of Nuisance Parameters0
Representing Edge Flows on Graphs via Sparse Cell ComplexesCode0
Painterly Image Harmonization using Diffusion ModelCode1
Residual-Based Error Corrector Operator to Enhance Accuracy and Reliability of Neural Operator Surrogates of Nonlinear Variational Boundary-Value Problems0
Adaptive Deep Neural Network Inference Optimization with EENetCode1
Networked Signal and Information Processing0
Enhanced graph-learning schemes driven by similar distributions of motifsCode0
Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation0
Easy and Efficient Transformer: Scalable Inference Solution For Large NLP Model0
SBbadger: Biochemical Reaction Networks with Definable Degree Distributions0
ADJUST: A Dictionary-Based Joint Reconstruction and Unmixing Method for Spectral TomographyCode1
A Novel 1D State Space for Efficient Music Rhythmic AnalysisCode1
Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization0
Developing efficient transfer learning strategies for robust scene recognition in mobile robotics using pre-trained convolutional neural networks0
Easy and Efficient Transformer : Scalable Inference Solution For large NLP modelCode1
Investigations on the inference optimization techniques and their impact on multiple hardware platforms for Semantic Segmentation0
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition0
A bi-partite generative model framework for analyzing and simulating large scale multiple discrete-continuous travel behaviour data0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.