SOTAVerified

Mixture-of-Experts

Papers

Showing 651675 of 1312 papers

TitleStatusHype
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR0
Mixture of Experts in a Mixture of RL settings0
MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias0
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis ConstructionCode0
Theory on Mixture-of-Experts in Continual Learning0
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-trainingCode5
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-ExpresserCode0
SimSMoE: Solving Representational Collapse via Similarity Measure0
Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation0
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language ModelsCode1
P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts0
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
Variational Distillation of Diffusion Policies into Mixture of Experts0
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsCode5
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language UnderstandingCode0
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-ExpertsCode1
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
Graph Knowledge Distillation to Mixture of ExpertsCode0
MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-ExpertsCode1
Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction0
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model FusionCode1
DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of ExpertsCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated ParametersCode9
MEFT: Memory-Efficient Fine-Tuning through Sparse AdapterCode1
Show:102550
← PrevPage 27 of 53Next →

No leaderboard results yet.