| MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators | Apr 3, 2025 | Mixture-of-ExpertsQuantization | CodeCode Available | 1 | 5 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy | Oct 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| PAD-Net: An Efficient Framework for Dynamic Networks | Nov 10, 2022 | image-classificationImage Classification | CodeCode Available | 1 | 5 |
| MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jun 7, 2024 | CPUGPU | CodeCode Available | 1 | 5 |
| Merging Experts into One: Improving Computational Efficiency of Mixture of Experts | Oct 15, 2023 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference | Jan 16, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 | 5 |
| ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Feb 20, 2025 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 | 5 |
| Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Jun 12, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 | 5 |
| MedCoT: Medical Chain of Thought via Hierarchical Expert | Dec 18, 2024 | DiagnosticMedical Visual Question Answering | CodeCode Available | 1 | 5 |
| Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts | Oct 8, 2022 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 1 | 5 |
| Merging Multi-Task Models via Weight-Ensembling Mixture of Experts | Feb 1, 2024 | Mixture-of-ExpertsTask Arithmetic | CodeCode Available | 1 | 5 |
| Few-Shot and Continual Learning with Attentive Independent Mechanisms | Jul 29, 2021 | Continual LearningFew-Shot Learning | CodeCode Available | 1 | 5 |
| Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node | May 27, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing | Dec 22, 2023 | Mixture-of-ExpertsMotion Generation | CodeCode Available | 1 | 5 |
| Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts | Aug 22, 2023 | Mixture-of-ExpertsNeRF | CodeCode Available | 1 | 5 |
| Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Apr 21, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| M^4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts | May 15, 2024 | Image SegmentationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction | Aug 26, 2020 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries | Aug 16, 2022 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 | 5 |
| GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable Recommendation | Oct 15, 2024 | Explainable RecommendationLanguage Modelling | CodeCode Available | 1 | 5 |
| M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Oct 26, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 | 5 |
| Addressing Confounding Feature Issue for Causal Recommendation | May 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 | 5 |