| MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering | Mar 2, 2023 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 |
| Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers | Mar 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Mixture of Decision Trees for Interpretable Machine Learning | Nov 26, 2022 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 |
| Spatial Mixture-of-Experts | Nov 24, 2022 | Mixture-of-Experts | CodeCode Available | 1 |
| PAD-Net: An Efficient Framework for Dynamic Networks | Nov 10, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Oct 26, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation | Oct 14, 2022 | CPUMachine Translation | CodeCode Available | 1 |
| Mixture of Attention Heads: Selecting Attention Heads Per Token | Oct 11, 2022 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 |
| Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts | Oct 8, 2022 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 1 |
| Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries | Aug 16, 2022 | Mixture-of-Experts | CodeCode Available | 1 |
| Towards Understanding Mixture of Experts in Deep Learning | Aug 4, 2022 | Deep LearningMixture-of-Experts | CodeCode Available | 1 |
| Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts | Jul 24, 2022 | Deep Reinforcement LearningHumanoid Control | CodeCode Available | 1 |
| Sparse Mixture-of-Experts are Domain Generalizable Learners | Jun 8, 2022 | Domain GeneralizationMixture-of-Experts | CodeCode Available | 1 |
| Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation | Jun 3, 2022 | DecoderImage Segmentation | CodeCode Available | 1 |
| Addressing Confounding Feature Issue for Causal Recommendation | May 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 |
| StableMoE: Stable Routing Strategy for Mixture of Experts | Apr 18, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation | Apr 15, 2022 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |
| 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition | Apr 7, 2022 | Mixture-of-Expertsspeech-recognition | CodeCode Available | 1 |
| Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution | Mar 27, 2022 | Image Super-ResolutionMixture-of-Experts | CodeCode Available | 1 |
| SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization | Mar 13, 2022 | Abstractive Text SummarizationDocument Summarization | CodeCode Available | 1 |
| Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models | Mar 2, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate | Dec 29, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification | Dec 16, 2021 | Generalizable Person Re-identificationMixture-of-Experts | CodeCode Available | 1 |
| Unsupervised Foreground Extraction via Deep Region Competition | Oct 29, 2021 | Image SegmentationInductive Bias | CodeCode Available | 1 |
| HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models | Oct 8, 2021 | Abstractive Text SummarizationDecoder | CodeCode Available | 1 |