| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation | Apr 15, 2022 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection | Sep 26, 2023 | Instance SegmentationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Modality Interactive Mixture-of-Experts for Fake News Detection | Jan 21, 2025 | Fake News DetectionMisinformation | CodeCode Available | 1 | 5 |
| Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs | Jul 1, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 | 5 |
| Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters | Feb 1, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 1 | 5 |
| Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts | Aug 22, 2023 | Mixture-of-ExpertsNeRF | CodeCode Available | 1 | 5 |
| Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the Wild | Oct 7, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 | 5 |
| MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Apr 9, 2025 | Computational EfficiencyDenoising | CodeCode Available | 1 | 5 |
| Distilling the Knowledge in a Neural Network | Mar 9, 2015 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 | 5 |