SOTAVerified

Mixture-of-Experts

Papers

Showing 10011050 of 1312 papers

TitleStatusHype
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training0
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings0
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce0
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-ExpertsCode0
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking0
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion PathsCode0
Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts0
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models0
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding0
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model0
Condensing Multilingual Knowledge with Lightweight Language-Specific ModulesCode0
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis0
Lifelong Language Pretraining with Distribution-Specialized Experts0
Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts0
Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition0
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception0
Demystifying Softmax Gating Function in Gaussian Mixture of Experts0
Steered Mixture-of-Experts Autoencoder Design for Real-Time Image Modelling and Denoising0
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic CapacityCode0
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism0
Revisiting Single-gated Mixtures of Experts0
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement0
Mixed Regression via Approximate Message Passing0
Steered Mixture of Experts Regression for Image Denoising with Multi-Model-Inference0
Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse SkillsCode0
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal0
Disguise without Disruption: Utility-Preserving Face De-Identification0
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset0
HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular Signals0
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning0
Scaling Vision-Language Models with Sparse Mixture of Experts0
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training0
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference0
Improving Expert Specialization in Mixture of Experts0
Improved Training of Mixture-of-Experts Language GANs0
TMoE-P: Towards the Pareto Optimum for Multivariate Soft Sensors0
Massively Multilingual Shallow Fusion with Large Language Models0
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective0
Alternating Updates for Efficient Transformers0
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets0
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction0
Covariate-guided Bayesian mixture model for multivariate time seriesCode0
Semantic-Aware Dynamic Parameter for Video Inpainting Transformer0
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners0
AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts0
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model0
MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion0
Show:102550
← PrevPage 21 of 27Next →

No leaderboard results yet.