| Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training | May 23, 2024 | GSM8KMixture-of-Experts | CodeCode Available | 7 |
| MiniMax-01: Scaling Foundation Models with Lightning Attention | Jan 14, 2025 | Mixture-of-Experts | CodeCode Available | 7 |
| HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer | May 28, 2025 | Image GenerationMixture-of-Experts | CodeCode Available | 7 |
| MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | Jun 16, 2025 | Mixture-of-ExpertsReinforcement Learning (RL) | CodeCode Available | 7 |
| DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models | Jan 11, 2024 | Language ModellingLarge Language Model | CodeCode Available | 5 |
| Kimi-VL Technical Report | Apr 10, 2025 | Long-Context UnderstandingMathematical Reasoning | CodeCode Available | 5 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 |
| Aria: An Open Multimodal Native Mixture-of-Experts Model | Oct 8, 2024 | Instruction FollowingMixture-of-Experts | CodeCode Available | 5 |
| Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent | Nov 4, 2024 | Logical ReasoningMathematical Problem-Solving | CodeCode Available | 5 |
| Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |