Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–50 of 1312 papers

Title	Date	Tasks	Status	Hype
DeepSeek-V3 Technical Report	Dec 27, 2024	GPULanguage Modeling	CodeCode Available	16
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Mar 10, 2025	Continual LearningMeta-Learning	CodeCode Available	9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Jun 10, 2024	Mixture-of-Experts	CodeCode Available	9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	May 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	Jun 16, 2025	Mixture-of-ExpertsReinforcement Learning (RL)	CodeCode Available	7
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	May 28, 2025	Image GenerationMixture-of-Experts	CodeCode Available	7
MoBA: Mixture of Block Attention for Long-Context LLMs	Feb 18, 2025	Mixture-of-Experts	CodeCode Available	7
MiniMax-01: Scaling Foundation Models with Lightning Attention	Jan 14, 2025	Mixture-of-Experts	CodeCode Available	7
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models	Jan 29, 2024	HallucinationMixture-of-Experts	CodeCode Available	7
Kimi-VL Technical Report	Apr 10, 2025	Long-Context UnderstandingMathematical Reasoning	CodeCode Available	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Nov 4, 2024	Logical ReasoningMathematical Problem-Solving	CodeCode Available	5
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Oct 14, 2024	Mixture-of-ExpertsTime Series	CodeCode Available	5
Aria: An Open Multimodal Native Mixture-of-Experts Model	Oct 8, 2024	Instruction FollowingMixture-of-Experts	CodeCode Available	5
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Aug 22, 2024	ChatbotInstruction Following	CodeCode Available	5
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget	Jul 22, 2024	Mixture-of-Experts	CodeCode Available	5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Jun 24, 2024	Mixture-of-Experts	CodeCode Available	5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	5
Parrot: Multilingual Visual Instruction Tuning	Jun 4, 2024	Mixture-of-Experts	CodeCode Available	5
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts	May 18, 2024	Mixture-of-ExpertsVisual Question Answering	CodeCode Available	5
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral	Mar 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models	Jan 11, 2024	Language ModellingLarge Language Model	CodeCode Available	5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model	Jun 28, 2023	HallucinationKnowledge Graphs	CodeCode Available	5
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	May 10, 2025	AttributeMixture-of-Experts	CodeCode Available	4
Training Sparse Mixture Of Experts Text Embedding Models	Feb 11, 2025	Mixture-of-ExpertsRAG	CodeCode Available	4
MoH: Multi-Head Attention as Mixture-of-Head Attention	Oct 15, 2024	Mixture-of-Experts	CodeCode Available	4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Oct 9, 2024	GPUMixture-of-Experts	CodeCode Available	4
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Sep 24, 2024	Computational EfficiencyMixture-of-Experts	CodeCode Available	4
OLMoE: Open Mixture-of-Experts Language Models	Sep 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models	Jul 2, 2024	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	4
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models	Jun 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
JetMoE: Reaching Llama2 Performance with 0.1M Dollars	Apr 11, 2024	GPUMixture-of-Experts	CodeCode Available	4
Mixtral of Experts	Jan 8, 2024	Code GenerationCommon Sense Reasoning	CodeCode Available	4
Fast Inference of Mixture-of-Experts Language Models with Offloading	Dec 28, 2023	Mixture-of-ExpertsQuantization	CodeCode Available	4
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale	Jun 30, 2022	CPUGPU	CodeCode Available	4
FlashDMoE: Fast Distributed MoE in a Single Kernel	Jun 5, 2025	16kCPU	CodeCode Available	3
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	May 4, 2025	Mixture-of-ExpertsNeRF	CodeCode Available	3
A Survey on Inference Optimization Techniques for Mixture of Experts Models	Dec 18, 2024	Computational EfficiencyDistributed Computing	CodeCode Available	3
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Oct 21, 2024	Autonomous DrivingData Augmentation	CodeCode Available	3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Aug 28, 2024	Computational EfficiencyHallucination	CodeCode Available	3
AnyGraph: Graph Foundation Model in the Wild	Aug 20, 2024	Graph LearningMixture-of-Experts	CodeCode Available	3
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation	Jul 5, 2024	Drum TranscriptionDrum Transcription in Music (DTM)	CodeCode Available	3
A Survey on Mixture of Experts	Jun 26, 2024	In-Context LearningMixture-of-Experts	CodeCode Available	3
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model	Jun 2, 2024	DenoisingMixture-of-Experts	CodeCode Available	3

Show:10 25 50

← PrevPage 1 of 27Next →

No leaderboard results yet.