| Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy | Oct 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection | Sep 26, 2023 | Instance SegmentationMixture-of-Experts | CodeCode Available | 1 |
| LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models | Sep 25, 2023 | GPUMixture-of-Experts | CodeCode Available | 1 |
| Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts | Sep 25, 2023 | Density EstimationMixture-of-Experts | —Unverified | 0 |
| Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | Sep 11, 2023 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | Sep 8, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 |
| Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives | Sep 1, 2023 | Mixture-of-Experts | CodeCode Available | 0 |
| Task-Based MoE for Multitask Multilingual Machine Translation | Aug 30, 2023 | Machine TranslationMixture-of-Experts | —Unverified | 0 |
| SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget | Aug 29, 2023 | Mixture-of-Expertsobject-detection | —Unverified | 0 |