| Memory Analysis on the Training Course of DeepSeek Models | Feb 11, 2025 | GPUMixture-of-Experts | —Unverified | 0 | 0 |
| Memory Augmented Language Models through Mixture of Word Experts | Nov 15, 2023 | Mixture-of-Experts | —Unverified | 0 | 0 |
| Toward generalizable learning of all (linear) first-order methods via memory augmented Transformers | Oct 8, 2024 | AllMixture-of-Experts | —Unverified | 0 | 0 |
| Memory Clustering using Persistent Homology for Multimodality- and Discontinuity-Sensitive Learning of Optimal Control Warm-starts | Oct 2, 2020 | ClusteringMixture-of-Experts | —Unverified | 0 | 0 |
| Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model | Dec 19, 2022 | GPUMachine Translation | —Unverified | 0 | 0 |
| MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning | Oct 19, 2024 | Deep Reinforcement LearningMixture-of-Experts | —Unverified | 0 | 0 |
| MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs | Feb 3, 2025 | Mathematical ReasoningMixture-of-Experts | —Unverified | 0 | 0 |
| MERLOT: A Distilled LLM-based Mixture-of-Experts Framework for Scalable Encrypted Traffic Classification | Nov 20, 2024 | DecoderLanguage Modeling | —Unverified | 0 | 0 |
| MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification | Jan 1, 2025 | image-classificationImage Classification | —Unverified | 0 | 0 |
| mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training | Jan 7, 2025 | BlockingGPU | —Unverified | 0 | 0 |