Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 1312 papers

Title	Date	Tasks	Status	Hype	Score
DeepSeek-V3 Technical Report	Dec 27, 2024	GPULanguage Modeling	CodeCode Available	16	5
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13	5
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13	5
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9	5
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	May 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9	5
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9	5
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Jun 10, 2024	Mixture-of-Experts	CodeCode Available	9	5
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Mar 10, 2025	Continual LearningMeta-Learning	CodeCode Available	9	5
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	May 28, 2025	Image GenerationMixture-of-Experts	CodeCode Available	7	5
MoBA: Mixture of Block Attention for Long-Context LLMs	Feb 18, 2025	Mixture-of-Experts	CodeCode Available	7	5
MiniMax-01: Scaling Foundation Models with Lightning Attention	Jan 14, 2025	Mixture-of-Experts	CodeCode Available	7	5
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models	Jan 29, 2024	HallucinationMixture-of-Experts	CodeCode Available	7	5
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	Jun 16, 2025	Mixture-of-ExpertsReinforcement Learning (RL)	CodeCode Available	7	5
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7	5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Nov 4, 2024	Logical ReasoningMathematical Problem-Solving	CodeCode Available	5	5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model	Jun 28, 2023	HallucinationKnowledge Graphs	CodeCode Available	5	5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	5	5
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Oct 14, 2024	Mixture-of-ExpertsTime Series	CodeCode Available	5	5
Kimi-VL Technical Report	Apr 10, 2025	Long-Context UnderstandingMathematical Reasoning	CodeCode Available	5	5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models	Jan 11, 2024	Language ModellingLarge Language Model	CodeCode Available	5	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5	5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Jun 24, 2024	Mixture-of-Experts	CodeCode Available	5	5
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Aug 22, 2024	ChatbotInstruction Following	CodeCode Available	5	5
Aria: An Open Multimodal Native Mixture-of-Experts Model	Oct 8, 2024	Instruction FollowingMixture-of-Experts	CodeCode Available	5	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5	5

Show:10 25 50

← PrevPage 1 of 53Next →

No leaderboard results yet.