| Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning | Apr 10, 2025 | Mixture-of-Expertsreinforcement-learning | —Unverified | 0 |
| Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models | Apr 10, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models | Apr 10, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 0 |
| FedMerge: Federated Personalization via Model Merging | Apr 9, 2025 | Federated LearningMixture-of-Experts | —Unverified | 0 |
| Holistic Capability Preservation: Towards Compact Yet Comprehensive Reasoning Models | Apr 9, 2025 | Instruction FollowingMathematical Problem-Solving | —Unverified | 0 |
| Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations | Apr 8, 2025 | Instruction FollowingMixture-of-Experts | —Unverified | 0 |
| HeterMoE: Efficient Training of Mixture-of-Experts Models on Heterogeneous GPUs | Apr 4, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation | Apr 4, 2025 | Change DetectionDepth Estimation | —Unverified | 0 |
| MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism | Apr 3, 2025 | CPUGPU | —Unverified | 0 |
| Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design | Apr 2, 2025 | AttributeMixture-of-Experts | —Unverified | 0 |