| ModuleFormer: Modularity Emerges from Mixture-of-Experts | Jun 7, 2023 | Language ModellingLightweight Deployment | CodeCode Available | 2 |
| Learning A Sparse Transformer Network for Effective Image Deraining | Mar 21, 2023 | Image ReconstructionImage Restoration | CodeCode Available | 2 |
| Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints | Dec 9, 2022 | Mixture-of-Experts | CodeCode Available | 2 |
| No Language Left Behind: Scaling Human-Centered Machine Translation | Jul 11, 2022 | Machine TranslationMixture-of-Experts | CodeCode Available | 2 |
| Towards Universal Sequence Representation Learning for Recommender Systems | Jun 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 2 |
| Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs | Jun 9, 2022 | Image CaptioningImage Classification | CodeCode Available | 2 |
| Tutel: Adaptive Mixture-of-Experts at Scale | Jun 7, 2022 | Mixture-of-ExpertsObject Detection | CodeCode Available | 2 |
| Text2Human: Text-Driven Controllable Human Image Generation | May 31, 2022 | DiversityHuman Parsing | CodeCode Available | 2 |
| MDFEND: Multi-domain Fake News Detection | Jan 4, 2022 | Fake News DetectionMixture-of-Experts | CodeCode Available | 2 |
| Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | Jan 11, 2021 | Language ModellingMixture-of-Experts | CodeCode Available | 2 |