SOTAVerified

Inference Optimization

Papers

Showing 150 of 56 papers

TitleStatusHype
The 1st Solution for 4th PVUW MeViS Challenge: Unleashing the Potential of Large Multimodal Models for Referring Video SegmentationCode5
Optimizing LLM Inference: Fluid-Guided Online Scheduling with Memory ConstraintsCode4
Inference Performance Optimization for Large Language Models on CPUsCode3
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RLCode3
CycleBNN: Cyclic Precision Training in Binary Neural NetworksCode2
Painterly Image Harmonization using Diffusion ModelCode1
Easy and Efficient Transformer : Scalable Inference Solution For large NLP modelCode1
Adaptive Deep Neural Network Inference Optimization with EENetCode1
A Novel 1D State Space for Efficient Music Rhythmic AnalysisCode1
ADJUST: A Dictionary-Based Joint Reconstruction and Unmixing Method for Spectral TomographyCode1
LLaSA: Large Language and E-Commerce Shopping AssistantCode0
Brevity is the soul of sustainability: Characterizing LLM response lengthsCode0
A General Method for Amortizing Variational FilteringCode0
A Temporal Linear Network for Time Series ForecastingCode0
Enhanced graph-learning schemes driven by similar distributions of motifsCode0
Input Convex Neural NetworksCode0
Iterative Amortized InferenceCode0
LLM-Rank: A Graph Theoretical Approach to Pruning Large Language ModelsCode0
Patched MOA: optimizing inference for diverse software development tasksCode0
Representing Edge Flows on Graphs via Sparse Cell ComplexesCode0
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert MergingCode0
Residual-Based Error Corrector Operator to Enhance Accuracy and Reliability of Neural Operator Surrogates of Nonlinear Variational Boundary-Value Problems0
Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks0
Energy-Efficient Transformer Inference: Optimization Strategies for Time Series Classification0
SNDCNN: Self-normalizing deep CNNs with scaled exponential linear units for speech recognition0
Faster MoE LLM Inference for Extremely Large Models0
Federated Learning While Providing Model as a Service: Joint Training and Inference Optimization0
FluidML: Fast and Memory Efficient Inference Optimization0
Hellinger-Kantorovich Gradient Flows: Global Exponential Decay of Entropy Functionals0
Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization0
Inference Optimization of Foundation Models on AI Accelerators0
Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization0
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities0
Investigations on the inference optimization techniques and their impact on multiple hardware platforms for Semantic Segmentation0
SySMOL: Co-designing Algorithms and Hardware for Neural Networks with Heterogeneous Precisions0
Learning to Infer0
An approach to optimize inference of the DIART speaker diarization pipeline0
Networked Signal and Information Processing0
Integrating Pre-Trained Speech and Language Models for End-to-End Speech Recognition0
Advances and Open Challenges in Federated Foundation Models0
The Foundation Cracks: A Comprehensive Study on Bugs and Testing Practices in LLM Libraries0
Bayesian Active Learning in the Presence of Nuisance Parameters0
Revisiting SMoE Language Models by Evaluating Inefficiencies with Task Specific Expert Pruning0
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization0
SBbadger: Biochemical Reaction Networks with Definable Degree Distributions0
Scaling the Vocabulary of Non-autoregressive Models for Efficient Generative Retrieval0
Self-Constrained Inference Optimization on Structural Groups for Human Pose Estimation0
A bi-partite generative model framework for analyzing and simulating large scale multiple discrete-continuous travel behaviour data0
Deep Signal Recovery with One-Bit Quantization0
Show:102550
← PrevPage 1 of 2Next →

No leaderboard results yet.