| CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | May 9, 2024 | Image CaptioningInstruction Following | CodeCode Available | 2 | 5 |
| Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation | May 26, 2024 | feature selectionMixture-of-Experts | CodeCode Available | 2 | 5 |
| Monet: Mixture of Monosemantic Experts for Transformers | Dec 5, 2024 | Dictionary LearningMixture-of-Experts | CodeCode Available | 2 | 5 |
| ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Dec 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Oct 10, 2024 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection | Apr 12, 2024 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese Characters | Nov 18, 2024 | fill-maskFill Mask | CodeCode Available | 2 | 5 |
| ModuleFormer: Modularity Emerges from Mixture-of-Experts | Jun 7, 2023 | Language ModellingLightweight Deployment | CodeCode Available | 2 | 5 |
| Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models | Apr 16, 2024 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling | Sep 28, 2024 | image-classificationImage Classification | CodeCode Available | 2 | 5 |
| Mixture of A Million Experts | Jul 4, 2024 | Computational EfficiencyLanguage Modeling | CodeCode Available | 2 | 5 |
| Mixture of Lookup Experts | Mar 20, 2025 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| MDFEND: Multi-domain Fake News Detection | Jan 4, 2022 | Fake News DetectionMixture-of-Experts | CodeCode Available | 2 | 5 |
| Fast Feedforward Networks | Aug 28, 2023 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Oct 8, 2024 | Mixture-of-ExpertsQuantization | CodeCode Available | 2 | 5 |
| MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Sep 11, 2024 | Autonomous DrivingFeature Engineering | CodeCode Available | 2 | 5 |
| Mixture of Tokens: Continuous MoE through Cross-Example Aggregation | Oct 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 2 | 5 |
| MoEUT: Mixture-of-Experts Universal Transformers | May 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 | 5 |
| MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks | Jun 7, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 | 5 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 | 5 |
| Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Oct 14, 2024 | Mixture-of-Experts | CodeCode Available | 2 | 5 |
| Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts | Jul 7, 2025 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 | 5 |
| LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Jan 7, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 2 | 5 |
| Learning A Sparse Transformer Network for Effective Image Deraining | Mar 21, 2023 | Image ReconstructionImage Restoration | CodeCode Available | 2 | 5 |
| A Closer Look into Mixture-of-Experts in Large Language Models | Jun 26, 2024 | Computational EfficiencyDiversity | CodeCode Available | 2 | 5 |