Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1312 papers

Title	Date	Tasks	Status	Hype	Score
DeepSeek-V3 Technical Report	Dec 27, 2024	GPULanguage Modeling	CodeCode Available	16	5
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13	5
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13	5
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Mar 10, 2025	Continual LearningMeta-Learning	CodeCode Available	9	5
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9	5
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9	5
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	May 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9	5
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Jun 10, 2024	Mixture-of-Experts	CodeCode Available	9	5
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models	Jan 29, 2024	HallucinationMixture-of-Experts	CodeCode Available	7	5
MoBA: Mixture of Block Attention for Long-Context LLMs	Feb 18, 2025	Mixture-of-Experts	CodeCode Available	7	5
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	Jun 16, 2025	Mixture-of-ExpertsReinforcement Learning (RL)	CodeCode Available	7	5
MiniMax-01: Scaling Foundation Models with Lightning Attention	Jan 14, 2025	Mixture-of-Experts	CodeCode Available	7	5
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7	5
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	May 28, 2025	Image GenerationMixture-of-Experts	CodeCode Available	7	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5	5
Parrot: Multilingual Visual Instruction Tuning	Jun 4, 2024	Mixture-of-Experts	CodeCode Available	5	5
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Aug 22, 2024	ChatbotInstruction Following	CodeCode Available	5	5
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts	May 18, 2024	Mixture-of-ExpertsVisual Question Answering	CodeCode Available	5	5
Aria: An Open Multimodal Native Mixture-of-Experts Model	Oct 8, 2024	Instruction FollowingMixture-of-Experts	CodeCode Available	5	5
Kimi-VL Technical Report	Apr 10, 2025	Long-Context UnderstandingMathematical Reasoning	CodeCode Available	5	5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Nov 4, 2024	Logical ReasoningMathematical Problem-Solving	CodeCode Available	5	5
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral	Mar 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models	Jan 11, 2024	Language ModellingLarge Language Model	CodeCode Available	5	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5	5
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Jul 22, 2024	Mixture-of-Experts	CodeCode Available	5	5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Jun 24, 2024	Mixture-of-Experts	CodeCode Available	5	5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model	Jun 28, 2023	HallucinationKnowledge Graphs	CodeCode Available	5	5
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Oct 14, 2024	Mixture-of-ExpertsTime Series	CodeCode Available	5	5
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale	Jun 30, 2022	CPUGPU	CodeCode Available	4	5
OLMoE: Open Mixture-of-Experts Language Models	Sep 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	4	5
JetMoE: Reaching Llama2 Performance with 0.1M Dollars	Apr 11, 2024	GPUMixture-of-Experts	CodeCode Available	4	5
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Oct 9, 2024	GPUMixture-of-Experts	CodeCode Available	4	5
Training Sparse Mixture Of Experts Text Embedding Models	Feb 11, 2025	Mixture-of-ExpertsRAG	CodeCode Available	4	5
Mixtral of Experts	Jan 8, 2024	Code GenerationCommon Sense Reasoning	CodeCode Available	4	5
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	May 10, 2025	AttributeMixture-of-Experts	CodeCode Available	4	5
Fast Inference of Mixture-of-Experts Language Models with Offloading	Dec 28, 2023	Mixture-of-ExpertsQuantization	CodeCode Available	4	5
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models	Jun 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	4	5
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models	Jul 2, 2024	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	4	5
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Sep 24, 2024	Computational EfficiencyMixture-of-Experts	CodeCode Available	4	5
MoH: Multi-Head Attention as Mixture-of-Head Attention	Oct 15, 2024	Mixture-of-Experts	CodeCode Available	4	5
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	May 4, 2025	Mixture-of-ExpertsNeRF	CodeCode Available	3	5
BlackMamba: Mixture of Experts for State-Space Models	Feb 1, 2024	Language ModelingLanguage Modelling	CodeCode Available	3	5
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters	Mar 18, 2024	Continual LearningIncremental Learning	CodeCode Available	3	5
MVMoE: Multi-Task Vehicle Routing Solver with Mixture-of-Experts	May 2, 2024	Combinatorial OptimizationMixture-of-Experts	CodeCode Available	3	5
A Survey on Mixture of Experts	Jun 26, 2024	In-Context LearningMixture-of-Experts	CodeCode Available	3	5
A Survey on Inference Optimization Techniques for Mixture of Experts Models	Dec 18, 2024	Computational EfficiencyDistributed Computing	CodeCode Available	3	5
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts	Jan 8, 2024	MambaMixture-of-Experts	CodeCode Available	3	5
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Oct 21, 2024	Autonomous DrivingData Augmentation	CodeCode Available	3	5
MoAI: Mixture of All Intelligence for Large Language and Vision Models	Mar 12, 2024	AllMixture-of-Experts	CodeCode Available	3	5

Show:10 25 50

← PrevPage 1 of 27Next →

No leaderboard results yet.