| MCR-DL: Mix-and-Match Communication Runtime for Deep Learning | Mar 15, 2023 | Deep LearningGPU | —Unverified | 0 |
| Scaling Vision-Language Models with Sparse Mixture of Experts | Mar 13, 2023 | Mixture-of-Experts | —Unverified | 0 |
| A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training | Mar 11, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference | Mar 10, 2023 | CPUDecoder | —Unverified | 0 |
| Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers | Mar 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering | Mar 2, 2023 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 |
| Improving Expert Specialization in Mixture of Experts | Feb 28, 2023 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Improved Training of Mixture-of-Experts Language GANs | Feb 23, 2023 | Adversarial TextImage Generation | —Unverified | 0 |
| TMoE-P: Towards the Pareto Optimum for Multivariate Soft Sensors | Feb 21, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Massively Multilingual Shallow Fusion with Large Language Models | Feb 17, 2023 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |