SOTAVerified

Mixture-of-Experts

Papers

Showing 926950 of 1312 papers

TitleStatusHype
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training0
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language ModelCode5
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluationCode1
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings0
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision TransformerCode1
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce0
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-ExpertsCode0
ModuleFormer: Modularity Emerges from Mixture-of-ExpertsCode2
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural NetworksCode1
COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local SearchCode1
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking0
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-ExpertsCode1
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion PathsCode0
Emergent Modularity in Pre-trained TransformersCode1
Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts0
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models0
Condensing Multilingual Knowledge with Lightweight Language-Specific ModulesCode0
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model0
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding0
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis0
Lifelong Language Pretraining with Distribution-Specialized Experts0
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
Show:102550
← PrevPage 38 of 53Next →

No leaderboard results yet.