| Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs | Jun 9, 2022 | Image CaptioningImage Classification | CodeCode Available | 2 |
| Tutel: Adaptive Mixture-of-Experts at Scale | Jun 7, 2022 | Mixture-of-ExpertsObject Detection | CodeCode Available | 2 |
| Text2Human: Text-Driven Controllable Human Image Generation | May 31, 2022 | DiversityHuman Parsing | CodeCode Available | 2 |
| MDFEND: Multi-domain Fake News Detection | Jan 4, 2022 | Fake News DetectionMixture-of-Experts | CodeCode Available | 2 |
| Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity | Jan 11, 2021 | Language ModellingMixture-of-Experts | CodeCode Available | 2 |
| Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer | Jan 23, 2017 | Computational EfficiencyGPU | CodeCode Available | 2 |
| Learning to Skip the Middle Layers of Transformers | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution | Jun 13, 2025 | Image Super-ResolutionMixture-of-Experts | CodeCode Available | 1 |
| SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model | Jun 2, 2025 | Mixture-of-ExpertsUnsupervised Pre-training | CodeCode Available | 1 |
| Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer | May 30, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models | May 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation | May 24, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | May 22, 2025 | GPULong-range modeling | CodeCode Available | 1 |
| U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | May 20, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | May 19, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 |
| MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design | May 9, 2025 | Mixture-of-ExpertsQuantization | CodeCode Available | 1 |
| Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing | May 1, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identification | Apr 21, 2025 | Exemplar-FreeKnowledge Distillation | CodeCode Available | 1 |
| Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images | Apr 21, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Dense Backpropagation Improves Training for Sparse Mixture-of-Experts | Apr 16, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing | Apr 10, 2025 | In-Context LearningMixture-of-Experts | CodeCode Available | 1 |
| MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution | Apr 9, 2025 | Computational EfficiencyDenoising | CodeCode Available | 1 |
| MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators | Apr 3, 2025 | Mixture-of-ExpertsQuantization | CodeCode Available | 1 |
| SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking | Mar 24, 2025 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 1 |