Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1–25 of 1312 papers

Title	Date	Tasks	Status	Hype
DeepSeek-V3 Technical Report	Dec 27, 2024	GPULanguage Modeling	CodeCode Available	16
Qwen2.5 Technical Report	Dec 19, 2024	Common Sense Reasoning	CodeCode Available	13
Qwen2 Technical Report	Jul 15, 2024	Arithmetic ReasoningGSM8K	CodeCode Available	13
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence	Jun 17, 2024	16kLanguage Modeling	CodeCode Available	9
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated Parameters	Jun 10, 2024	Mixture-of-Experts	CodeCode Available	9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding	Dec 13, 2024	Chart UnderstandingMixture-of-Experts	CodeCode Available	9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model	May 7, 2024	Language ModelingLanguage Modelling	CodeCode Available	9
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications	Mar 10, 2025	Continual LearningMeta-Learning	CodeCode Available	9
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer	May 28, 2025	Image GenerationMixture-of-Experts	CodeCode Available	7
MoBA: Mixture of Block Attention for Long-Context LLMs	Feb 18, 2025	Mixture-of-Experts	CodeCode Available	7
MiniMax-01: Scaling Foundation Models with Lightning Attention	Jan 14, 2025	Mixture-of-Experts	CodeCode Available	7
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention	Jun 16, 2025	Mixture-of-ExpertsReinforcement Learning (RL)	CodeCode Available	7
MoE-LLaVA: Mixture of Experts for Large Vision-Language Models	Jan 29, 2024	HallucinationMixture-of-Experts	CodeCode Available	7
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM Training	May 23, 2024	GSM8KMixture-of-Experts	CodeCode Available	7
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent	Nov 4, 2024	Logical ReasoningMathematical Problem-Solving	CodeCode Available	5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model	Jun 28, 2023	HallucinationKnowledge Graphs	CodeCode Available	5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts	Jun 18, 2024	Language ModelingLanguage Modelling	CodeCode Available	5
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of Experts	Oct 14, 2024	Mixture-of-ExpertsTime Series	CodeCode Available	5
Aria: An Open Multimodal Native Mixture-of-Experts Model	Oct 8, 2024	Instruction FollowingMixture-of-Experts	CodeCode Available	5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models	Jan 11, 2024	Language ModellingLarge Language Model	CodeCode Available	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training	Jun 24, 2024	Mixture-of-Experts	CodeCode Available	5
Kimi-VL Technical Report	Apr 10, 2025	Long-Context UnderstandingMathematical Reasoning	CodeCode Available	5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts	Feb 27, 2025	Computational EfficiencyGPU	CodeCode Available	5
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale	Aug 22, 2024	ChatbotInstruction Following	CodeCode Available	5

Show:10 25 50

← PrevPage 1 of 53Next →

No leaderboard results yet.