| Go Wider Instead of Deeper | Jul 25, 2021 | Image ClassificationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Gradient-free variational learning with conditional mixture networks | Aug 29, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| Norface: Improving Facial Expression Analysis by Identity Normalization | Jul 22, 2024 | ClassificationEmotion Recognition | CodeCode Available | 1 | 5 |
| BiMediX: Bilingual Medical Mixture of Experts LLM | Feb 20, 2024 | Mixture-of-ExpertsMultiple-choice | CodeCode Available | 1 | 5 |
| MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models | Jul 18, 2023 | Language ModellingMixture-of-Experts | CodeCode Available | 1 | 5 |
| Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | May 19, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design | May 9, 2025 | Mixture-of-ExpertsQuantization | CodeCode Available | 1 | 5 |
| GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Oct 15, 2024 | Explainable RecommendationLanguage Modelling | CodeCode Available | 1 | 5 |
| Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting | Aug 20, 2024 | AttributeMixture-of-Experts | CodeCode Available | 1 | 5 |
| Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts | Nov 19, 2023 | DiversityMixture-of-Experts | CodeCode Available | 1 | 5 |
| Frequency-Adaptive Pan-Sharpening with Mixture of Experts | Jan 4, 2024 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks | Nov 26, 2020 | Depth EstimationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks | Jun 7, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Multi-Head Mixture-of-Experts | Apr 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Few-Shot and Continual Learning with Attentive Independent Mechanisms | Jul 29, 2021 | Continual LearningFew-Shot Learning | CodeCode Available | 1 | 5 |
| Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization | Feb 19, 2024 | Attributecounterfactual | CodeCode Available | 1 | 5 |
| MomentumSMoE: Integrating Momentum into Sparse Mixture of Experts | Oct 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling | Mar 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 | 5 |
| Multimodal Clinical Trial Outcome Prediction with Large Language Models | Feb 9, 2024 | Mixture-of-ExpertsPrediction | CodeCode Available | 1 | 5 |
| MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing Networks | Jan 21, 2025 | iFunMixture-of-Experts | CodeCode Available | 1 | 5 |
| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | Jan 16, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoExtend: Tuning New Experts for Modality and Task Extension | Aug 7, 2024 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| EWMoE: An effective model for global weather forecasting with mixture-of-experts | May 9, 2024 | Mixture-of-ExpertsWeather Forecasting | CodeCode Available | 1 | 5 |
| Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Jun 12, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoËT: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning | Jun 16, 2019 | Game of GoImitation Learning | CodeCode Available | 1 | 5 |
| Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identification | Apr 21, 2025 | Exemplar-FreeKnowledge Distillation | CodeCode Available | 1 | 5 |
| Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts | Jan 18, 2021 | AllMixture-of-Experts | CodeCode Available | 1 | 5 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 | 5 |
| XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Distilling the Knowledge in a Neural Network | Mar 9, 2015 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 | 5 |
| DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality Reduction | Oct 25, 2024 | Dimensionality ReductionMixture-of-Experts | CodeCode Available | 1 | 5 |
| Specialized federated learning using a mixture of experts | Oct 5, 2020 | Federated LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | May 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of Experts | Nov 16, 2024 | Mixture-of-ExpertsOptical Character Recognition (OCR) | CodeCode Available | 1 | 5 |
| FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing | Dec 22, 2023 | Mixture-of-ExpertsMotion Generation | CodeCode Available | 1 | 5 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 | 5 |
| MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation | Apr 15, 2022 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 | 5 |
| DirectMultiStep: Direct Route Generation for Multi-Step Retrosynthesis | May 22, 2024 | DiversityMixture-of-Experts | CodeCode Available | 1 | 5 |
| Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node | May 27, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Apr 9, 2025 | Computational EfficiencyDenoising | CodeCode Available | 1 | 5 |
| Gated Multimodal Units for Information Fusion | Feb 7, 2017 | General ClassificationGenre classification | CodeCode Available | 1 | 5 |
| MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation | Mar 4, 2025 | Font GenerationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts | Jun 17, 2024 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing | Jul 26, 2024 | AttributeLanguage Modelling | CodeCode Available | 1 | 5 |
| Efficient Dictionary Learning with Switch Sparse Autoencoders | Oct 10, 2024 | Dictionary LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation | Oct 14, 2022 | CPUMachine Translation | CodeCode Available | 1 | 5 |
| MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection | Sep 26, 2023 | Instance SegmentationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs | Jul 1, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 | 5 |
| Modality Interactive Mixture-of-Experts for Fake News Detection | Jan 21, 2025 | Fake News DetectionMisinformation | CodeCode Available | 1 | 5 |