| Octavius: Mitigating Task Interference in MLLMs via LoRA-MoE | Nov 5, 2023 | DecoderMixture-of-Experts | CodeCode Available | 0 |
| Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach | Nov 1, 2023 | Domain AdaptationMixture-of-Experts | —Unverified | 0 |
| SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models | Oct 29, 2023 | GPUMixture-of-Experts | CodeCode Available | 1 |
| QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models | Oct 25, 2023 | GPUMixture-of-Experts | CodeCode Available | 2 |
| Mixture of Tokens: Continuous MoE through Cross-Example Aggregation | Oct 24, 2023 | Language ModellingLarge Language Model | CodeCode Available | 2 |
| SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation | Oct 24, 2023 | Code GenerationCode Translation | CodeCode Available | 1 |
| A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts | Oct 22, 2023 | Density EstimationMixture-of-Experts | —Unverified | 0 |
| Manifold-Preserving Transformers are Effective for Short-Long Range Encoding | Oct 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| Direct Neural Machine Translation with Task-level Mixture of Experts models | Oct 18, 2023 | Direct NMTLarge Language Model | —Unverified | 0 |
| Multi-view Contrastive Learning for Entity Typing over Knowledge Graphs | Oct 18, 2023 | Contrastive LearningEntity Typing | CodeCode Available | 0 |