| An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training | Jun 29, 2023 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills | Jun 28, 2023 | Mixture-of-ExpertsNatural Language Understanding | —Unverified | 0 |
| JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving | Jun 19, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings | Jun 14, 2023 | DiversityFederated Learning | —Unverified | 0 |
| Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce | Jun 8, 2023 | Contrastive LearningMixture-of-Experts | —Unverified | 0 |
| Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts | Jun 8, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking | Jun 1, 2023 | Dialogue State TrackingMixture-of-Experts | —Unverified | 0 |
| Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment | Jun 1, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths | May 29, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 0 |
| Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts | May 25, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models | May 24, 2023 | Mixture-of-ExpertsZero-shot Generalization | —Unverified | 0 |
| Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding | May 23, 2023 | Citation PredictionContrastive Learning | —Unverified | 0 |
| Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model | May 23, 2023 | AvgLanguage Modeling | —Unverified | 0 |
| Condensing Multilingual Knowledge with Lightweight Language-Specific Modules | May 23, 2023 | Machine TranslationMixture-of-Experts | CodeCode Available | 0 |
| To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis | May 22, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Lifelong Language Pretraining with Distribution-Specialized Experts | May 20, 2023 | Lifelong learningMixture-of-Experts | —Unverified | 0 |
| Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts | May 12, 2023 | Ensemble LearningMixture-of-Experts | —Unverified | 0 |
| Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition | May 12, 2023 | Bayesian InferenceMixture-of-Experts | —Unverified | 0 |
| Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception | May 10, 2023 | Classificationimage-classification | —Unverified | 0 |
| Demystifying Softmax Gating Function in Gaussian Mixture of Experts | May 5, 2023 | Mixture-of-Expertsparameter estimation | —Unverified | 0 |
| Steered Mixture-of-Experts Autoencoder Design for Real-Time Image Modelling and Denoising | May 5, 2023 | DecoderDenoising | —Unverified | 0 |
| Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity | May 3, 2023 | Machine TranslationMixture-of-Experts | CodeCode Available | 0 |
| Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism | Apr 22, 2023 | AllMixture-of-Experts | —Unverified | 0 |
| Revisiting Single-gated Mixtures of Experts | Apr 11, 2023 | Mixture-of-Experts | —Unverified | 0 |
| FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement | Apr 8, 2023 | Mixture-of-ExpertsScheduling | —Unverified | 0 |