| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 |
| LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Aug 28, 2024 | Computational EfficiencyHallucination | CodeCode Available | 3 |
| A Survey on Mixture of Experts | Jun 26, 2024 | In-Context LearningMixture-of-Experts | CodeCode Available | 3 |
| Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | May 4, 2025 | Mixture-of-ExpertsNeRF | CodeCode Available | 3 |
| YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation | Jul 5, 2024 | Drum TranscriptionDrum Transcription in Music (DTM) | CodeCode Available | 3 |
| Scaling Laws for Fine-Grained Mixture of Experts | Feb 12, 2024 | Mixture-of-Experts | CodeCode Available | 3 |
| BlackMamba: Mixture of Experts for State-Space Models | Feb 1, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 3 |
| Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Mar 18, 2024 | Continual LearningIncremental Learning | CodeCode Available | 3 |
| Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model | Jun 2, 2024 | DenoisingMixture-of-Experts | CodeCode Available | 3 |
| SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling | Dec 23, 2023 | Instruction FollowingLanguage Modeling | CodeCode Available | 3 |
| MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts | Jan 8, 2024 | MambaMixture-of-Experts | CodeCode Available | 3 |
| Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | Sep 11, 2023 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Harder Tasks Need More Experts: Dynamic Routing in MoE Models | Mar 12, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models | Oct 25, 2023 | GPUMixture-of-Experts | CodeCode Available | 2 |
| Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks | Jan 5, 2024 | Arithmetic ReasoningCode Generation | CodeCode Available | 2 |
| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark Dataset | Dec 9, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | Feb 22, 2024 | AllMixture-of-Experts | CodeCode Available | 2 |
| Open-RAG: Enhanced Retrieval-Augmented Reasoning with Open-Source Large Language Models | Oct 2, 2024 | Mixture-of-ExpertsNavigate | CodeCode Available | 2 |
| Motion In-Betweening with Phase Manifolds | Aug 24, 2023 | Mixture-of-Expertsmotion in-betweening | CodeCode Available | 2 |
| Multi-Task Dense Prediction via Mixture of Low-Rank Experts | Mar 26, 2024 | DecoderMixture-of-Experts | CodeCode Available | 2 |
| Flex-MoE: Modeling Arbitrary Modality Combination via the Flexible Mixture-of-Experts | Oct 10, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Monet: Mixture of Monosemantic Experts for Transformers | Dec 5, 2024 | Dictionary LearningMixture-of-Experts | CodeCode Available | 2 |
| Fast Feedforward Networks | Aug 28, 2023 | Mixture-of-Experts | CodeCode Available | 2 |
| MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Jul 9, 2025 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models | Apr 16, 2024 | image-classificationImage Classification | CodeCode Available | 2 |
| MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks | Jun 7, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| MoEUT: Mixture-of-Experts Universal Transformers | May 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| No Language Left Behind: Scaling Human-Centered Machine Translation | Jul 11, 2022 | Machine TranslationMixture-of-Experts | CodeCode Available | 2 |
| ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Dec 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family Experts | Oct 14, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Mixture of A Million Experts | Jul 4, 2024 | Computational EfficiencyLanguage Modeling | CodeCode Available | 2 |
| Mixture of Lookup Experts | Mar 20, 2025 | Mixture-of-Experts | CodeCode Available | 2 |
| Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models | May 23, 2024 | Mixture-of-ExpertsVisual Question Answering | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Sep 11, 2024 | Autonomous DrivingFeature Engineering | CodeCode Available | 2 |
| ModuleFormer: Modularity Emerges from Mixture-of-Experts | Jun 7, 2023 | Language ModellingLightweight Deployment | CodeCode Available | 2 |
| LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration | Oct 20, 2024 | AllComputational Efficiency | CodeCode Available | 2 |
| Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Feb 24, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| Demystifying the Compression of Mixture-of-Experts Through a Unified Framework | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains More | Oct 8, 2024 | Mixture-of-ExpertsQuantization | CodeCode Available | 2 |
| Delta Decompression for MoE-based LLMs Compression | Feb 24, 2025 | DiversityMixture-of-Experts | CodeCode Available | 2 |
| DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Dec 14, 2024 | Mixture-of-ExpertsObject | CodeCode Available | 2 |
| Mixture of Tokens: Continuous MoE through Cross-Example Aggregation | Oct 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training | Nov 24, 2024 | MathMixture-of-Experts | CodeCode Available | 2 |
| MDFEND: Multi-domain Fake News Detection | Jan 4, 2022 | Fake News DetectionMixture-of-Experts | CodeCode Available | 2 |
| Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation | May 26, 2024 | feature selectionMixture-of-Experts | CodeCode Available | 2 |
| Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts | Jul 7, 2025 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 |
| KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? | Aug 21, 2024 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| A Closer Look into Mixture-of-Experts in Large Language Models | Jun 26, 2024 | Computational EfficiencyDiversity | CodeCode Available | 2 |