SOTAVerified

Mixture-of-Experts

Papers

Showing 301350 of 1312 papers

TitleStatusHype
Few-Shot and Continual Learning with Attentive Independent MechanismsCode1
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of ExpertsCode1
DirectMultiStep: Direct Route Generation for Multi-Step RetrosynthesisCode1
Specialized federated learning using a mixture of expertsCode1
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and InferenceCode1
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language ModelsCode1
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine TranslationCode1
Efficient and Degradation-Adaptive Network for Real-World Image Super-ResolutionCode1
Norface: Improving Facial Expression Analysis by Identity NormalizationCode1
Sequence-level Semantic Representation Fusion for Recommender SystemsCode1
Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisCode1
Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic ForecastingCode1
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference CostsCode1
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of AdaptersCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
EWMoE: An effective model for global weather forecasting with mixture-of-expertsCode1
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and EditingCode1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse GateCode1
Dense Backpropagation Improves Training for Sparse Mixture-of-ExpertsCode1
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model InferenceCode1
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-DesignCode1
MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language ModelsCode1
Re-IQA: Unsupervised Learning for Image Quality Assessment in the WildCode1
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable TransformersCode1
Sparse Universal TransformerCode1
Multi-Source Domain Adaptation with Mixture of ExpertsCode0
Multimodal Fusion Strategies for Mapping Biophysical Landscape FeaturesCode0
Multimodal Cultural Safety: Evaluation Frameworks and Alignment StrategiesCode0
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI ScaleCode0
Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression RecognitionCode0
Multi-view Contrastive Learning for Entity Typing over Knowledge GraphsCode0
MoVEInt: Mixture of Variational Experts for Learning Human-Robot Interactions from DemonstrationsCode0
Mosaic: Data-Free Knowledge Distillation via Mixture-of-Experts for Heterogeneous Distributed EnvironmentsCode0
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow ModelingCode0
More Experts Than Galaxies: Conditionally-overlapping Experts With Biologically-Inspired Fixed RoutingCode0
MoLEx: Mixture of Layer Experts for Finetuning with Sparse UpcyclingCode0
Adaptive 3D descattering with a dynamic synthesis networkCode0
Mol-MoE: Training Preference-Guided Routers for Molecule GenerationCode0
MoNTA: Accelerating Mixture-of-Experts Training with Network-Traffc-Aware Parallel OptimizationCode0
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE InferenceCode0
DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of ExpertsCode0
MOoSE: Multi-Orientation Sharing Experts for Open-set Scene Text RecognitionCode0
A Bird's-eye View of Reranking: from List Level to Page LevelCode0
A Teacher Is Worth A Million InstructionsCode0
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert SpecializationCode0
A Survey on Prompt TuningCode0
Covariate-guided Bayesian mixture model for multivariate time seriesCode0
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual DecodingCode0
Countering Mainstream Bias via End-to-End Adaptive Local LearningCode0
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of ExpertsCode0
Show:102550
← PrevPage 7 of 27Next →

No leaderboard results yet.