| RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths | May 29, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 0 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts | May 25, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models | May 24, 2023 | Mixture-of-ExpertsZero-shot Generalization | —Unverified | 0 |
| Condensing Multilingual Knowledge with Lightweight Language-Specific Modules | May 23, 2023 | Machine TranslationMixture-of-Experts | CodeCode Available | 0 |
| Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model | May 23, 2023 | AvgLanguage Modeling | —Unverified | 0 |
| Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding | May 23, 2023 | Citation PredictionContrastive Learning | —Unverified | 0 |
| To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis | May 22, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Lifelong Language Pretraining with Distribution-Specialized Experts | May 20, 2023 | Lifelong learningMixture-of-Experts | —Unverified | 0 |
| Lifting the Curse of Capacity Gap in Distilling Language Models | May 20, 2023 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |