| SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR | Jun 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Mixture of Experts in a Mixture of RL settings | Jun 26, 2024 | Deep Reinforcement LearningMixture-of-Experts | —Unverified | 0 |
| MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias | Jun 25, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| Theory on Mixture-of-Experts in Continual Learning | Jun 24, 2024 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser | Jun 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SimSMoE: Solving Representational Collapse via Similarity Measure | Jun 22, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation | Jun 19, 2024 | Continual LearningImage Segmentation | —Unverified | 0 |
| AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | Jun 19, 2024 | ARCMixture-of-Experts | CodeCode Available | 1 |
| P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts | Jun 18, 2024 | Mixture-of-Experts | —Unverified | 0 |
| GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory | Jun 18, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 0 |
| Variational Distillation of Diffusion Policies into Mixture of Experts | Jun 18, 2024 | DenoisingMixture-of-Experts | —Unverified | 0 |
| Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding | Jun 17, 2024 | Mixture-of-ExpertsNatural Language Understanding | CodeCode Available | 0 |
| Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts | Jun 17, 2024 | Mixture-of-Experts | CodeCode Available | 1 |
| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 |
| Graph Knowledge Distillation to Mixture of Experts | Jun 17, 2024 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 0 |
| MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts | Jun 17, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 1 |
| Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction | Jun 14, 2024 | Mixture-of-ExpertsPrediction | —Unverified | 0 |
| Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | Jun 14, 2024 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts | Jun 13, 2024 | ManagementMixture-of-Experts | CodeCode Available | 1 |
| Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Jun 12, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 |
| Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | Jun 10, 2024 | Mixture-of-Experts | CodeCode Available | 9 |
| MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jun 7, 2024 | CPUGPU | CodeCode Available | 1 |
| MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks | Jun 7, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| Style Mixture of Experts for Expressive Text-To-Speech Synthesis | Jun 5, 2024 | Mixture-of-ExpertsSpeech Synthesis | —Unverified | 0 |
| Continual Traffic Forecasting via Mixture of Experts | Jun 5, 2024 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach | Jun 5, 2024 | Mixture-of-ExpertsNode Classification | —Unverified | 0 |
| Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models | Jun 5, 2024 | Mixture-of-ExpertsTime Series | —Unverified | 0 |
| Parrot: Multilingual Visual Instruction Tuning | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| Demystifying the Compression of Mixture-of-Experts Through a Unified Framework | Jun 4, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models | Jun 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model | Jun 2, 2024 | DenoisingMixture-of-Experts | CodeCode Available | 3 |
| Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks | Jun 1, 2024 | ISACMixture-of-Experts | —Unverified | 0 |
| A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and Outliers | Jun 1, 2024 | Gaussian ProcessesMixture-of-Experts | CodeCode Available | 0 |
| Training-efficient density quantum machine learning | May 30, 2024 | LEMMAMixture-of-Experts | —Unverified | 0 |
| MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors | May 29, 2024 | Mixture-of-ExpertsModel Editing | —Unverified | 0 |
| Learning Mixture-of-Experts for General-Purpose Black-Box Discrete Optimization | May 29, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models | May 29, 2024 | DecoderGPU | —Unverified | 0 |
| LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design | May 28, 2024 | Mixture-of-Experts | —Unverified | 0 |
| XTrack: Multimodal Training Boosts RGB-X Video Object Trackers | May 28, 2024 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 |
| Yuan 2.0-M32: Mixture of Experts with Attention Router | May 28, 2024 | ARCMath | CodeCode Available | 2 |
| Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node | May 27, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 |
| A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts | May 26, 2024 | Binary ClassificationMixture-of-Experts | —Unverified | 0 |
| Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time Adaptation | May 26, 2024 | feature selectionMixture-of-Experts | CodeCode Available | 2 |
| MoEUT: Mixture-of-Experts Universal Transformers | May 25, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Expert-Token Resonance: Redefining MoE Routing through Affinity-Driven Active Selection | May 24, 2024 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training | May 23, 2024 | GSM8KMixture-of-Experts | CodeCode Available | 7 |
| Statistical Advantages of Perturbing Cosine Router in Mixture of Experts | May 23, 2024 | Mixture-of-Experts | —Unverified | 0 |