| Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer | Oct 15, 2023 | DiversityMixture-of-Experts | —Unverified | 0 |
| Adaptive Gating in Mixture-of-Experts based Language Models | Oct 11, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Beyond the Typical: Modeling Rare Plausible Patterns in Chemical Reactions by Leveraging Sequential Mixture-of-Experts | Oct 7, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts Conversion | Oct 6, 2023 | Mixture-of-Experts | CodeCode Available | 0 |
| Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition | Oct 4, 2023 | Mixture-of-Expertsreinforcement-learning | —Unverified | 0 |
| Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness | Oct 3, 2023 | GPUMachine Translation | —Unverified | 0 |
| FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion Models | Oct 3, 2023 | Face TransferMixture-of-Experts | CodeCode Available | 0 |
| Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts | Sep 25, 2023 | Density EstimationMixture-of-Experts | —Unverified | 0 |
| Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | Sep 8, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives | Sep 1, 2023 | Mixture-of-Experts | CodeCode Available | 0 |