| Scaling Laws for Fine-Grained Mixture of Experts | Feb 12, 2024 | Mixture-of-Experts | CodeCode Available | 3 |
| SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | Dec 23, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts | May 2, 2024 | Combinatorial OptimizationMixture-of-Experts | CodeCode Available | 3 |
| Generalizing Motion Planners with Mixture of Experts for Autonomous Driving | Oct 21, 2024 | Autonomous DrivingData Augmentation | CodeCode Available | 3 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models | Feb 10, 2024 | CPUGPU | CodeCode Available | 3 |
| MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | Jan 8, 2024 | MambaMixture-of-Experts | CodeCode Available | 3 |
| A Survey on Mixture of Experts | Jun 26, 2024 | In-Context LearningMixture-of-Experts | CodeCode Available | 3 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 |
| AnyGraph: Graph Foundation Model in the Wild | Aug 20, 2024 | Graph LearningMixture-of-Experts | CodeCode Available | 3 |
| Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model | Jun 2, 2024 | DenoisingMixture-of-Experts | CodeCode Available | 3 |
| ModuleFormer: Modularity Emerges from Mixture-of-Experts | Jun 7, 2023 | Language ModellingLightweight Deployment | CodeCode Available | 2 |
| Mixture of Lookup Experts | Mar 20, 2025 | Mixture-of-Experts | CodeCode Available | 2 |
| Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Oct 14, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Mixture of Tokens: Continuous MoE through Cross-Example Aggregation | Oct 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Sep 11, 2024 | Autonomous DrivingFeature Engineering | CodeCode Available | 2 |
| Mixture of A Million Experts | Jul 4, 2024 | Computational EfficiencyLanguage Modeling | CodeCode Available | 2 |
| MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection | Apr 12, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration | Oct 20, 2024 | AllComputational Efficiency | CodeCode Available | 2 |
| Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Feb 24, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models | May 23, 2024 | Mixture-of-ExpertsVisual Question Answering | CodeCode Available | 2 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 |
| MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Oct 8, 2024 | Mixture-of-ExpertsQuantization | CodeCode Available | 2 |
| LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Jan 7, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |