| M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Oct 26, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation | Oct 14, 2022 | CPUMachine Translation | CodeCode Available | 1 |
| Mixture of Attention Heads: Selecting Attention Heads Per Token | Oct 11, 2022 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 |
| Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts | Oct 8, 2022 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 1 |
| Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries | Aug 16, 2022 | Mixture-of-Experts | CodeCode Available | 1 |
| Towards Understanding Mixture of Experts in Deep Learning | Aug 4, 2022 | Deep LearningMixture-of-Experts | CodeCode Available | 1 |
| Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts | Jul 24, 2022 | Deep Reinforcement LearningHumanoid Control | CodeCode Available | 1 |
| Sparse Mixture-of-Experts are Domain Generalizable Learners | Jun 8, 2022 | Domain GeneralizationMixture-of-Experts | CodeCode Available | 1 |
| Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation | Jun 3, 2022 | DecoderImage Segmentation | CodeCode Available | 1 |
| Addressing Confounding Feature Issue for Causal Recommendation | May 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 |