SOTAVerified

Mixture-of-Experts

Papers

Showing 276300 of 1312 papers

TitleStatusHype
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question AnsweringCode1
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable TransformersCode1
Mixture of Decision Trees for Interpretable Machine LearningCode1
Spatial Mixture-of-ExpertsCode1
PAD-Net: An Efficient Framework for Dynamic NetworksCode1
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-designCode1
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine TranslationCode1
Mixture of Attention Heads: Selecting Attention Heads Per TokenCode1
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-ExpertsCode1
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical QueriesCode1
Towards Understanding Mixture of Experts in Deep LearningCode1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-ExpertsCode1
Sparse Mixture-of-Experts are Domain Generalizable LearnersCode1
Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image SegmentationCode1
Addressing Confounding Feature Issue for Causal RecommendationCode1
StableMoE: Stable Routing Strategy for Mixture of ExpertsCode1
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided AdaptationCode1
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognitionCode1
Efficient and Degradation-Adaptive Network for Real-World Image Super-ResolutionCode1
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive SummarizationCode1
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language ModelsCode1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse GateCode1
Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identificationCode1
Unsupervised Foreground Extraction via Deep Region CompetitionCode1
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder ModelsCode1
Show:102550
← PrevPage 12 of 53Next →

No leaderboard results yet.