| Taming Sparsely Activated Transformer with Stochastic Experts | Oct 8, 2021 | Machine TranslationMixture-of-Experts | CodeCode Available | 1 |
| Sparse MoEs meet Efficient Ensembles | Oct 7, 2021 | Few-Shot LearningMixture-of-Experts | CodeCode Available | 1 |
| Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss | Sep 9, 2021 | Mixture-of-ExpertsRetrieval | CodeCode Available | 1 |
| Few-Shot and Continual Learning with Attentive Independent Mechanisms | Jul 29, 2021 | Continual LearningFew-Shot Learning | CodeCode Available | 1 |
| Go Wider Instead of Deeper | Jul 25, 2021 | Image ClassificationMixture-of-Experts | CodeCode Available | 1 |
| Heterogeneous Multi-task Learning with Expert Diversity | Jun 20, 2021 | DiversityMixture-of-Experts | CodeCode Available | 1 |
| Scaling Vision with Sparse Mixture of Experts | Jun 10, 2021 | Few-Shot Image ClassificationImage Classification | CodeCode Available | 1 |
| RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling | May 14, 2021 | Dialogue GenerationLanguage Modeling | CodeCode Available | 1 |
| SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts | May 7, 2021 | DiversityMixture-of-Experts | CodeCode Available | 1 |
| MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering | May 5, 2021 | ClusteringContrastive Learning | CodeCode Available | 1 |