| GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | Jun 30, 2020 | Machine TranslationMixture-of-Experts | CodeCode Available | 0 |
| Deep Mixture of Experts via Shallow Embedding | Jun 5, 2018 | Few-Shot LearningMeta-Learning | CodeCode Available | 0 |
| Build a Robust QA System with Transformer-based Mixture of Experts | Mar 20, 2022 | Data AugmentationMixture-of-Experts | CodeCode Available | 0 |
| TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning | Jan 10, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 0 |
| DESIRE-ME: Domain-Enhanced Supervised Information REtrieval using Mixture-of-Experts | Mar 20, 2024 | Information RetrievalMixture-of-Experts | CodeCode Available | 0 |
| DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale | Jan 14, 2022 | DecoderMixture-of-Experts | CodeCode Available | 0 |
| SEKE: Specialised Experts for Keyword Extraction | Dec 18, 2024 | DescriptiveKeyword Extraction | CodeCode Available | 0 |
| Mixture of Link Predictors on Graphs | Feb 13, 2024 | Link PredictionMixture-of-Experts | CodeCode Available | 0 |
| Mixture-of-Experts Variational Autoencoder for Clustering and Generating from Similarity-Based Representations on Single Cell Data | Oct 17, 2019 | ClusteringDecoder | CodeCode Available | 0 |
| Opponent Modeling in Deep Reinforcement Learning | Sep 18, 2016 | Deep Reinforcement LearningMixture-of-Experts | CodeCode Available | 0 |