SOTAVerified

Mixture-of-Experts

Papers

Showing 951975 of 1312 papers

TitleStatusHype
Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts0
Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
Steered Mixture-of-Experts Autoencoder Design for Real-Time Image Modelling and Denoising0
Demystifying Softmax Gating Function in Gaussian Mixture of Experts0
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic CapacityCode0
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data IntegrationCode1
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism0
Revisiting Single-gated Mixtures of Experts0
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement0
Mixed Regression via Approximate Message Passing0
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge ExcavationCode1
Re-IQA: Unsupervised Learning for Image Quality Assessment in the WildCode1
Steered Mixture of Experts Regression for Image Denoising with Multi-Model-Inference0
Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse SkillsCode0
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal0
Disguise without Disruption: Utility-Preserving Face De-Identification0
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset0
Learning A Sparse Transformer Network for Effective Image DerainingCode2
HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular Signals0
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning0
Scaling Vision-Language Models with Sparse Mixture of Experts0
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training0
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference0
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable TransformersCode1
Show:102550
← PrevPage 39 of 53Next →

No leaderboard results yet.