| Exploring Routing Strategies for Multilingual Mixture-of-Experts Models | Jan 1, 2021 | DecoderMixture-of-Experts | —Unverified | 0 |
| A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training | Mar 11, 2023 | Mixture-of-Experts | —Unverified | 0 |
| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 |
| Exploring Domain Robust Lightweight Reward Models based on Router Mechanism | Jul 24, 2024 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design | Apr 2, 2025 | AttributeMixture-of-Experts | —Unverified | 0 |
| CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling | Feb 3, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities | Mar 28, 2025 | Mixture-of-ExpertsText Generation | —Unverified | 0 |
| A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction | Aug 1, 2023 | Mixture-of-ExpertsPosition | —Unverified | 0 |
| Explainable data-driven modeling via mixture of experts: towards effective blending of grey and black-box models | Jan 30, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Ada-K Routing: Boosting the Efficiency of MoE-based LLMs | Oct 14, 2024 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |