SOTAVerified

Mixture-of-Experts

Papers

Showing 651700 of 1312 papers

TitleStatusHype
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR0
Mixture of Experts in a Mixture of RL settings0
MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias0
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis ConstructionCode0
Theory on Mixture-of-Experts in Continual Learning0
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-trainingCode5
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-ExpresserCode0
SimSMoE: Solving Representational Collapse via Similarity Measure0
Low-Rank Mixture-of-Experts for Continual Medical Image Segmentation0
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language ModelsCode1
P-Tailor: Customizing Personality Traits for Language Models via Mixture of Specialized LoRA Experts0
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
Variational Distillation of Diffusion Policies into Mixture of Experts0
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsCode5
Not Eliminate but Aggregate: Post-Hoc Control over Mixture-of-Experts to Address Shortcut Shifts in Natural Language UnderstandingCode0
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-ExpertsCode1
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
Graph Knowledge Distillation to Mixture of ExpertsCode0
MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-ExpertsCode1
Interpretable Cascading Mixture-of-Experts for Urban Traffic Congestion Prediction0
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model FusionCode1
DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of ExpertsCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated ParametersCode9
MEFT: Memory-Efficient Fine-Tuning through Sparse AdapterCode1
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision TasksCode2
Style Mixture of Experts for Expressive Text-To-Speech Synthesis0
Continual Traffic Forecasting via Mixture of Experts0
Node-wise Filtering in Graph Neural Networks: A Mixture of Experts Approach0
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models0
Parrot: Multilingual Visual Instruction TuningCode5
Demystifying the Compression of Mixture-of-Experts Through a Unified FrameworkCode2
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language ModelsCode4
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward ModelCode3
Optimizing 6G Integrated Sensing and Communications (ISAC) via Expert Networks0
A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and OutliersCode0
Training-efficient density quantum machine learning0
MEMoE: Enhancing Model Editing with Mixture of Experts Adaptors0
Learning Mixture-of-Experts for General-Purpose Black-Box Discrete OptimizationCode0
MoNDE: Mixture of Near-Data Experts for Large-Scale Sparse Models0
LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design0
XTrack: Multimodal Training Boosts RGB-X Video Object TrackersCode2
Yuan 2.0-M32: Mixture of Experts with Attention RouterCode2
Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf NodeCode1
A Provably Effective Method for Pruning Experts in Fine-tuned Sparse Mixture-of-Experts0
Decomposing the Neurons: Activation Sparsity via Mixture of Experts for Continual Test Time AdaptationCode2
MoEUT: Mixture-of-Experts Universal TransformersCode2
Expert-Token Resonance: Redefining MoE Routing through Affinity-Driven Active Selection0
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM TrainingCode7
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts0
Show:102550
← PrevPage 14 of 27Next →

No leaderboard results yet.