SOTAVerified

Mixture-of-Experts

Papers

Showing 10511100 of 1312 papers

TitleStatusHype
Quantitative Stock Investment by Routing Uncertainty-Aware Trading Experts: A Multi-Task Learning Approach0
Tutel: Adaptive Mixture-of-Experts at ScaleCode2
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts0
Interpretable Mixture of Experts0
Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image SegmentationCode1
Task-Specific Expert Pruning for Sparse Mixture-of-Experts0
Text2Human: Text-Driven Controllable Human Image GenerationCode2
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers0
Automatic Expert Selection for Multi-Scenario and Multi-Task Search0
Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-ExpertsCode0
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT0
MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services0
Pluralistic Image Completion with Probabilistic Mixture-of-Experts0
Unified Modeling of Multi-Domain Multi-Device ASR Systems0
Addressing Confounding Feature Issue for Causal RecommendationCode1
ST-ExpertNet: A Deep Expert Framework for Traffic Prediction0
Optimizing Mixture of Experts using Dynamic Recompilations0
How Can Cross-lingual Knowledge Contribute Better to Fine-Grained Entity Typing?0
On the Representation Collapse of Sparse Mixture of Experts0
Residual Mixture of Experts0
Table-based Fact Verification with Self-adaptive Mixture of ExpertsCode0
Towards Efficient Single Image Dehazing and Desnowing0
StableMoE: Stable Routing Strategy for Mixture of ExpertsCode1
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners0
Mixture of Experts for Biomedical Question Answering0
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided AdaptationCode1
Mixture-of-experts VAEs can disregard variation in surjective multimodal data0
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognitionCode1
Learning to Adapt Clinical Sequences with Residual Mixture of ExpertsCode0
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation0
On the Adaptation to Concept Drift for CTR Prediction0
Efficient Reflectance Capture with a Deep Gated Mixture-of-Experts0
Efficient and Degradation-Adaptive Network for Real-World Image Super-ResolutionCode1
Build a Robust QA System with Transformer-based Mixture of ExpertsCode0
Efficient Language Modeling with Sparse all-MLP0
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive SummarizationCode1
SkillNet-NLU: A Sparsely Activated Model for General-Purpose Natural Language Understanding0
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language ModelsCode1
Functional mixture-of-experts for classification0
Mixture-of-Experts with Expert Choice Routing0
ST-MoE: Designing Stable and Transferable Sparse Expert ModelsCode3
A Survey on Dynamic Neural Networks for Natural Language Processing0
Physics-Guided Problem Decomposition for Scaling Deep Learning of High-dimensional Eigen-Solvers: The Case of Schrödinger's Equation0
One Student Knows All Experts Know: From Sparse to Dense0
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners0
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation0
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI ScaleCode0
Towards Lightweight Neural Animation : Exploration of Neural Network Pruning in Mixture of Experts-based Animation Models0
MDFEND: Multi-domain Fake News DetectionCode2
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse GateCode1
Show:102550
← PrevPage 22 of 27Next →

No leaderboard results yet.