| DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | Jan 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 | 5 |
| OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models | Jan 29, 2024 | DecoderMixture-of-Experts | CodeCode Available | 5 | 5 |
| Parrot: Multilingual Visual Instruction Tuning | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 5 | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 | 5 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 | 5 |
| LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 5 | 5 |
| Aria: An Open Multimodal Native Mixture-of-Experts Model | Oct 8, 2024 | Instruction FollowingMixture-of-Experts | CodeCode Available | 5 | 5 |
| Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model | Jun 28, 2023 | HallucinationKnowledge Graphs | CodeCode Available | 5 | 5 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 | 5 |
| JetMoE: Reaching Llama2 Performance with 0.1M Dollars | Apr 11, 2024 | GPUMixture-of-Experts | CodeCode Available | 4 | 5 |