| M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Oct 26, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models | Nov 1, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 |
| Lifting the Curse of Capacity Gap in Distilling Language Models | May 20, 2023 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |
| Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts | Jul 24, 2022 | Deep Reinforcement LearningHumanoid Control | CodeCode Available | 1 |
| PAD-Net: An Efficient Framework for Dynamic Networks | Nov 10, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| Learning to Skip the Middle Layers of Transformers | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts | Feb 10, 2020 | Language ModellingMixture-of-Experts | CodeCode Available | 1 |
| ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Feb 20, 2025 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 |
| Layerwise Recurrent Router for Mixture-of-Experts | Aug 13, 2024 | AttributeMixture-of-Experts | CodeCode Available | 1 |
| M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis | Jul 24, 2024 | Mixture-of-ExpertsMultiple Instance Learning | CodeCode Available | 1 |
| JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | May 22, 2025 | GPULong-range modeling | CodeCode Available | 1 |
| CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Feb 6, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset | Oct 21, 2024 | Image DehazingMamba | CodeCode Available | 1 |
| RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling | May 14, 2021 | Dialogue GenerationLanguage Modeling | CodeCode Available | 1 |
| Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE | Feb 10, 2025 | DiversityLanguage Modeling | CodeCode Available | 1 |
| HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts | Dec 12, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach | Oct 18, 2023 | Blind Super-ResolutionDecoder | CodeCode Available | 1 |
| AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | Jun 19, 2024 | ARCMixture-of-Experts | CodeCode Available | 1 |
| Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction | Aug 26, 2020 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 |
| Addressing Confounding Feature Issue for Causal Recommendation | May 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 |
| C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Apr 10, 2025 | In-Context LearningMixture-of-Experts | CodeCode Available | 1 |
| HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts | Feb 20, 2024 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | May 30, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss | Sep 9, 2021 | Mixture-of-ExpertsRetrieval | CodeCode Available | 1 |
| Heterogeneous Multi-task Learning with Expert Diversity | Jun 20, 2021 | DiversityMixture-of-Experts | CodeCode Available | 1 |