SOTAVerified

Mixture-of-Experts

Papers

Showing 10111020 of 1312 papers

TitleStatusHype
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models0
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding0
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model0
Condensing Multilingual Knowledge with Lightweight Language-Specific ModulesCode0
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis0
Lifelong Language Pretraining with Distribution-Specialized Experts0
Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts0
Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
Demystifying Softmax Gating Function in Gaussian Mixture of Experts0
Show:102550
← PrevPage 102 of 132Next →

No leaderboard results yet.