| Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings | Jun 14, 2023 | DiversityFederated Learning | —Unverified | 0 |
| ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer | Jun 10, 2023 | Efficient ViTsMixture-of-Experts | CodeCode Available | 1 |
| Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce | Jun 8, 2023 | Contrastive LearningMixture-of-Experts | —Unverified | 0 |
| Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts | Jun 8, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ModuleFormer: Modularity Emerges from Mixture-of-Experts | Jun 7, 2023 | Language ModellingLightweight Deployment | CodeCode Available | 2 |
| Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks | Jun 7, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search | Jun 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment | Jun 1, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking | Jun 1, 2023 | Dialogue State TrackingMixture-of-Experts | —Unverified | 0 |
| Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts | May 30, 2023 | CPUGPU | CodeCode Available | 1 |