| GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned Experts | Dec 7, 2023 | DiversityGraph Neural Network | CodeCode Available | 1 | 5 |
| Graph Sparsification via Mixture of Graphs | May 23, 2024 | Graph LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | Aug 19, 2024 | ManagementMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection | Sep 26, 2023 | Instance SegmentationMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation | Apr 15, 2022 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Mixture of Experts Meets Prompt-Based Continual Learning | May 23, 2024 | Continual LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| Mixture-of-Linear-Experts for Long-term Time Series Forecasting | Dec 11, 2023 | Mixture-of-ExpertsTime Series | CodeCode Available | 1 | 5 |
| Mixture of Decision Trees for Interpretable Machine Learning | Nov 26, 2022 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| Go Wider Instead of Deeper | Jul 25, 2021 | Image ClassificationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language Models | Oct 14, 2024 | Federated LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies | Aug 13, 2024 | Language ModellingMixture-of-Experts | CodeCode Available | 1 | 5 |
| COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search | Jun 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Gated Multimodal Units for Information Fusion | Feb 7, 2017 | General ClassificationGenre classification | CodeCode Available | 1 | 5 |
| AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | Jun 19, 2024 | ARCMixture-of-Experts | CodeCode Available | 1 | 5 |
| Gradient-free variational learning with conditional mixture networks | Aug 29, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering | Mar 2, 2023 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 | 5 |
| Mixture of Attention Heads: Selecting Attention Heads Per Token | Oct 11, 2022 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 | 5 |
| Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing | May 1, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Apr 9, 2025 | Computational EfficiencyDenoising | CodeCode Available | 1 | 5 |
| FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | May 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference | Feb 6, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering | May 5, 2021 | ClusteringContrastive Learning | CodeCode Available | 1 | 5 |
| Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification | Dec 16, 2021 | Generalizable Person Re-identificationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Specialized federated learning using a mixture of experts | Oct 5, 2020 | Federated LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models | May 19, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 1 | 5 |
| MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators | Apr 3, 2025 | Mixture-of-ExpertsQuantization | CodeCode Available | 1 | 5 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy | Oct 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| PAD-Net: An Efficient Framework for Dynamic Networks | Nov 10, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jun 7, 2024 | CPUGPU | CodeCode Available | 1 | 5 |
| Merging Experts into One: Improving Computational Efficiency of Mixture of Experts | Oct 15, 2023 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | Jan 16, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 | 5 |
| ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Feb 20, 2025 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 | 5 |
| Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Jun 12, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 | 5 |
| MedCoT: Medical Chain of Thought via Hierarchical Expert | Dec 18, 2024 | DiagnosticMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts | Oct 8, 2022 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 1 | 5 |
| Merging Multi-Task Models via Weight-Ensembling Mixture of Experts | Feb 1, 2024 | Mixture-of-ExpertsTask Arithmetic | CodeCode Available | 1 | 5 |
| Few-Shot and Continual Learning with Attentive Independent Mechanisms | Jul 29, 2021 | Continual LearningFew-Shot Learning | CodeCode Available | 1 | 5 |
| Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node | May 27, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing | Dec 22, 2023 | Mixture-of-ExpertsMotion Generation | CodeCode Available | 1 | 5 |
| Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts | Aug 22, 2023 | Mixture-of-ExpertsNeRF | CodeCode Available | 1 | 5 |
| Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Apr 21, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| M^4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts | May 15, 2024 | Image SegmentationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction | Aug 26, 2020 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries | Aug 16, 2022 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 | 5 |
| GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Oct 15, 2024 | Explainable RecommendationLanguage Modelling | CodeCode Available | 1 | 5 |
| M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Oct 26, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 | 5 |
| Addressing Confounding Feature Issue for Causal Recommendation | May 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 | 5 |