| SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR | Jun 26, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| Mixture of Experts in a Mixture of RL settings | Jun 26, 2024 | Deep Reinforcement LearningMixture-of-Experts | —Unverified | 0 |
| MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias | Jun 25, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis Construction | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| Theory on Mixture-of-Experts in Continual Learning | Jun 24, 2024 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training | Jun 24, 2024 | Mixture-of-Experts | CodeCode Available | 5 |
| OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser | Jun 24, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| SimSMoE: Solving Representational Collapse via Similarity Measure | Jun 22, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation | Jun 19, 2024 | Continual LearningImage Segmentation | —Unverified | 0 |
| AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models | Jun 19, 2024 | ARCMixture-of-Experts | CodeCode Available | 1 |
| P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts | Jun 18, 2024 | Mixture-of-Experts | —Unverified | 0 |
| GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory | Jun 18, 2024 | Code GenerationMathematical Problem-Solving | CodeCode Available | 0 |
| Variational Distillation of Diffusion Policies into Mixture of Experts | Jun 18, 2024 | DenoisingMixture-of-Experts | —Unverified | 0 |
| Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts | Jun 18, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language Understanding | Jun 17, 2024 | Mixture-of-ExpertsNatural Language Understanding | CodeCode Available | 0 |
| Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts | Jun 17, 2024 | Mixture-of-Experts | CodeCode Available | 1 |
| DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | Jun 17, 2024 | 16kLanguage Modeling | CodeCode Available | 9 |
| Graph Knowledge Distillation to Mixture of Experts | Jun 17, 2024 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 0 |
| MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-Experts | Jun 17, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 1 |
| Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction | Jun 14, 2024 | Mixture-of-ExpertsPrediction | —Unverified | 0 |
| Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model Fusion | Jun 14, 2024 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of Experts | Jun 13, 2024 | ManagementMixture-of-Experts | CodeCode Available | 1 |
| Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Jun 12, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 |
| Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters | Jun 10, 2024 | Mixture-of-Experts | CodeCode Available | 9 |
| MEFT: Memory-Efficient Fine-Tuning through Sparse Adapter | Jun 7, 2024 | CPUGPU | CodeCode Available | 1 |