| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Learning to Skip the Middle Layers of Transformers | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution | Jun 13, 2025 | Image Super-ResolutionMixture-of-Experts | CodeCode Available | 1 |
| SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model | Jun 2, 2025 | Mixture-of-ExpertsUnsupervised Pre-training | CodeCode Available | 1 |
| Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | May 30, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | May 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation | May 24, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | May 22, 2025 | GPULong-range modeling | CodeCode Available | 1 |
| U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | May 20, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | May 19, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 |