| MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Apr 29, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 | 5 |
| An Empirical Study on Model-agnostic Debiasing Strategies for Robust Natural Language Inference | Oct 8, 2020 | Data AugmentationMixture-of-Experts | CodeCode Available | 0 | 5 |
| Build a Robust QA System with Transformer-based Mixture of Experts | Mar 20, 2022 | Data AugmentationMixture-of-Experts | CodeCode Available | 0 | 5 |
| Embarrassingly Parallel Inference for Gaussian Processes | Feb 27, 2017 | Gaussian ProcessesMixture-of-Experts | CodeCode Available | 0 | 5 |
| Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern Estimation | Nov 2, 2021 | Mixture-of-Experts | CodeCode Available | 0 | 5 |
| MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-Experts | Jul 13, 2024 | DiversityMixture-of-Experts | CodeCode Available | 0 | 5 |
| Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-Experts | May 25, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 0 | 5 |
| Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting | Feb 13, 2025 | Mixture-of-Experts | CodeCode Available | 0 | 5 |
| Manifold-Preserving Transformers are Effective for Short-Long Range Encoding | Oct 22, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 | 5 |
| m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers | Feb 26, 2024 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 0 | 5 |