| LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 5 | 5 |
| DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | Jan 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 | 5 |
| Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model | Jun 28, 2023 | HallucinationKnowledge Graphs | CodeCode Available | 5 | 5 |
| OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | Jan 29, 2024 | DecoderMixture-of-Experts | CodeCode Available | 5 | 5 |
| DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale | Jun 30, 2022 | CPUGPU | CodeCode Available | 4 | 5 |
| JetMoE: Reaching Llama2 Performance with 0.1M Dollars | Apr 11, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 | 5 |
| OLMoE: Open Mixture-of-Experts Language Models | Sep 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts | Oct 9, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 | 5 |
| Training Sparse Mixture Of Experts Text Embedding Models | Feb 11, 2025 | Mixture-of-ExpertsRAG | CodeCode Available | 4 | 5 |
| Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | May 10, 2025 | AttributeMixture-of-Experts | CodeCode Available | 4 | 5 |
| Mixtral of Experts | Jan 8, 2024 | Code GenerationCommon Sense Reasoning | CodeCode Available | 4 | 5 |
| Fast Inference of Mixture-of-Experts Language Models with Offloading | Dec 28, 2023 | Mixture-of-ExpertsQuantization | CodeCode Available | 4 | 5 |
| Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | Jun 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 | 5 |
| Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models | Jul 2, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 4 | 5 |
| MoH: Multi-Head Attention as Mixture-of-Head Attention | Oct 15, 2024 | Mixture-of-Experts | CodeCode Available | 4 | 5 |
| Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts | Sep 24, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 4 | 5 |
| BlackMamba: Mixture of Experts for State-Space Models | Feb 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 | 5 |
| Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | May 4, 2025 | Mixture-of-ExpertsNeRF | CodeCode Available | 3 | 5 |
| Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Mar 18, 2024 | Continual LearningIncremental Learning | CodeCode Available | 3 | 5 |
| MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts | May 2, 2024 | Combinatorial OptimizationMixture-of-Experts | CodeCode Available | 3 | 5 |
| A Survey on Mixture of Experts | Jun 26, 2024 | In-Context LearningMixture-of-Experts | CodeCode Available | 3 | 5 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 | 5 |
| AnyGraph: Graph Foundation Model in the Wild | Aug 20, 2024 | Graph LearningMixture-of-Experts | CodeCode Available | 3 | 5 |
| Generalizing Motion Planners with Mixture of Experts for Autonomous Driving | Oct 21, 2024 | Autonomous DrivingData Augmentation | CodeCode Available | 3 | 5 |
| MoAI: Mixture of All Intelligence for Large Language and Vision Models | Mar 12, 2024 | AllMixture-of-Experts | CodeCode Available | 3 | 5 |