| How Lightweight Can A Vision Transformer Be | Jul 25, 2024 | Mixture-of-ExpertsTransfer Learning | —Unverified | 0 |
| FedMerge: Federated Personalization via Model Merging | Apr 9, 2025 | Federated LearningMixture-of-Experts | —Unverified | 0 |
| Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers | May 28, 2022 | Machine TranslationMixture-of-Experts | —Unverified | 0 |
| A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts | May 26, 2024 | Binary ClassificationMixture-of-Experts | —Unverified | 0 |
| Coordination with Humans via Strategy Matching | Oct 27, 2022 | Mixture-of-Experts | —Unverified | 0 |
| GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input | Jun 1, 2021 | Mixture-of-Expertsnamed-entity-recognition | —Unverified | 0 |
| Generalizable Person Re-identification with Relevance-aware Mixture of Experts | May 19, 2021 | Generalizable Person Re-identificationMixture-of-Experts | —Unverified | 0 |
| Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study | Mar 26, 2024 | Learning TheoryMixture-of-Experts | —Unverified | 0 |
| How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Feb 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | May 21, 2025 | ChatbotInstruction Following | —Unverified | 0 |