| HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models | Oct 8, 2021 | Abstractive Text SummarizationDecoder | CodeCode Available | 1 |
| Sparse MoEs meet Efficient Ensembles | Oct 7, 2021 | Few-Shot LearningMixture-of-Experts | CodeCode Available | 1 |
| Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss | Sep 9, 2021 | Mixture-of-ExpertsRetrieval | CodeCode Available | 1 |
| Few-Shot and Continual Learning with Attentive Independent Mechanisms | Jul 29, 2021 | Continual LearningFew-Shot Learning | CodeCode Available | 1 |
| Go Wider Instead of Deeper | Jul 25, 2021 | Image ClassificationMixture-of-Experts | CodeCode Available | 1 |
| Heterogeneous Multi-task Learning with Expert Diversity | Jun 20, 2021 | DiversityMixture-of-Experts | CodeCode Available | 1 |
| Scaling Vision with Sparse Mixture of Experts | Jun 10, 2021 | Few-Shot Image ClassificationImage Classification | CodeCode Available | 1 |
| RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling | May 14, 2021 | Dialogue GenerationLanguage Modeling | CodeCode Available | 1 |
| SpeechMoE: Scaling to Large Acoustic Models with Dynamic Routing Mixture of Experts | May 7, 2021 | DiversityMixture-of-Experts | CodeCode Available | 1 |
| MiCE: Mixture of Contrastive Experts for Unsupervised Image Clustering | May 5, 2021 | ClusteringContrastive Learning | CodeCode Available | 1 |
| Cross-Domain Label-Adaptive Stance Detection | Apr 15, 2021 | Domain AdaptationMixture-of-Experts | CodeCode Available | 1 |
| VDSM: Unsupervised Video Disentanglement with State-Space Modeling and Deep Mixtures of Experts | Mar 12, 2021 | DecoderDisentanglement | CodeCode Available | 1 |
| Real-time Relevant Recommendation Suggestion | Mar 8, 2021 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 |
| Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts | Jan 18, 2021 | AllMixture-of-Experts | CodeCode Available | 1 |
| PFL-MoE: Personalized Federated Learning Based on Mixture of Experts | Dec 31, 2020 | Decision MakingFederated Learning | CodeCode Available | 1 |
| Multi-view Depth Estimation using Epipolar Spatio-Temporal Networks | Nov 26, 2020 | Depth EstimationMixture-of-Experts | CodeCode Available | 1 |
| Specialized federated learning using a mixture of experts | Oct 5, 2020 | Federated LearningMixture-of-Experts | CodeCode Available | 1 |
| Transformer Based Multi-Source Domain Adaptation | Sep 16, 2020 | Domain AdaptationMixture-of-Experts | CodeCode Available | 1 |
| Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction | Aug 26, 2020 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 |
| Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes | Jun 19, 2020 | Continual LearningDecision Making | CodeCode Available | 1 |
| Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts | Feb 10, 2020 | Language ModellingMixture-of-Experts | CodeCode Available | 1 |
| Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models | Nov 8, 2019 | Mixture-of-Experts | CodeCode Available | 1 |
| MoËT: Mixture of Expert Trees and its Application to Verifiable Reinforcement Learning | Jun 16, 2019 | Game of GoImitation Learning | CodeCode Available | 1 |
| Gated Multimodal Units for Information Fusion | Feb 7, 2017 | General ClassificationGenre classification | CodeCode Available | 1 |
| Distilling the Knowledge in a Neural Network | Mar 9, 2015 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |
| GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving | Jul 19, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 0 |
| R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning | Jul 17, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Mixture of Experts in Large Language Models | Jul 15, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive | Jul 13, 2025 | CPUInteractive Segmentation | —Unverified | 0 |
| KAT-V1: Kwai-AutoThink Technical Report | Jul 11, 2025 | Knowledge DistillationLarge Language Model | —Unverified | 0 |
| A Survey on Prompt Tuning | Jul 8, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 0 |
| Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis | Jul 8, 2025 | Data AugmentationMixture-of-Experts | —Unverified | 0 |
| What You Have is What You Track: Adaptive and Robust Multimodal Tracking | Jul 8, 2025 | Mixture-of-ExpertsVisual Tracking | CodeCode Available | 0 |
| Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate | Jul 8, 2025 | Continual LearningMixture-of-Experts | CodeCode Available | 0 |
| Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach | Jul 8, 2025 | Edge-computingFederated Learning | —Unverified | 0 |
| UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification | Jul 7, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging | Jun 29, 2025 | Inference OptimizationMixture-of-Experts | CodeCode Available | 0 |
| EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Jun 26, 2025 | Compositional Zero-Shot LearningMixture-of-Experts | —Unverified | 0 |
| Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Jun 26, 2025 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration | Jun 25, 2025 | Clinical KnowledgeComputed Tomography (CT) | —Unverified | 0 |
| Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks | Jun 23, 2025 | Mixture-of-ExpertsSafety Alignment | —Unverified | 0 |
| An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify | Jun 23, 2025 | Click-Through Rate PredictionMixture-of-Experts | —Unverified | 0 |
| SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification | Jun 20, 2025 | Mixture-of-ExpertsResponse Generation | —Unverified | 0 |
| Utility-Driven Speculative Decoding for Mixture-of-Experts | Jun 17, 2025 | GPULarge Language Model | —Unverified | 0 |
| Scaling Intelligence: Designing Data Centers for Next-Gen Language Models | Jun 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Jun 17, 2025 | Data IntegrationLarge Language Model | —Unverified | 0 |
| Exploring Speaker Diarization with Mixture of Experts | Jun 17, 2025 | Mixture-of-Expertsspeaker-diarization | —Unverified | 0 |
| MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models | Jun 17, 2025 | Mixture-of-ExpertsQuantization | —Unverified | 0 |
| Single-Example Learning in a Mixture of GPDMs with Latent Geometries | Jun 17, 2025 | Mixture-of-Experts | —Unverified | 0 |