Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 26–50 of 1312 papers

Title	Date	Tasks	Status	Hype
Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral	Mar 4, 2024	Language ModelingLanguage Modelling	CodeCode Available	5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models	Jan 29, 2024	DecoderMixture-of-Experts	CodeCode Available	5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models	Jan 11, 2024	Language ModellingLarge Language Model	CodeCode Available	5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model	Jun 28, 2023	HallucinationKnowledge Graphs	CodeCode Available	5
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free	May 10, 2025	AttributeMixture-of-Experts	CodeCode Available	4
Training Sparse Mixture Of Experts Text Embedding Models	Feb 11, 2025	Mixture-of-ExpertsRAG	CodeCode Available	4
MoH: Multi-Head Attention as Mixture-of-Head Attention	Oct 15, 2024	Mixture-of-Experts	CodeCode Available	4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation Experts	Oct 9, 2024	GPUMixture-of-Experts	CodeCode Available	4
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts	Sep 24, 2024	Computational EfficiencyMixture-of-Experts	CodeCode Available	4
OLMoE: Open Mixture-of-Experts Language Models	Sep 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models	Jul 2, 2024	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	4
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models	Jun 3, 2024	Language ModelingLanguage Modelling	CodeCode Available	4
JetMoE: Reaching Llama2 Performance with 0.1M Dollars	Apr 11, 2024	GPUMixture-of-Experts	CodeCode Available	4
Mixtral of Experts	Jan 8, 2024	Code GenerationCommon Sense Reasoning	CodeCode Available	4
Fast Inference of Mixture-of-Experts Language Models with Offloading	Dec 28, 2023	Mixture-of-ExpertsQuantization	CodeCode Available	4
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale	Jun 30, 2022	CPUGPU	CodeCode Available	4
FlashDMoE: Fast Distributed MoE in a Single Kernel	Jun 5, 2025	16kCPU	CodeCode Available	3
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields	May 4, 2025	Mixture-of-ExpertsNeRF	CodeCode Available	3
A Survey on Inference Optimization Techniques for Mixture of Experts Models	Dec 18, 2024	Computational EfficiencyDistributed Computing	CodeCode Available	3
Generalizing Motion Planners with Mixture of Experts for Autonomous Driving	Oct 21, 2024	Autonomous DrivingData Augmentation	CodeCode Available	3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation	Aug 28, 2024	Computational EfficiencyHallucination	CodeCode Available	3
AnyGraph: Graph Foundation Model in the Wild	Aug 20, 2024	Graph LearningMixture-of-Experts	CodeCode Available	3
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem Augmentation	Jul 5, 2024	Drum TranscriptionDrum Transcription in Music (DTM)	CodeCode Available	3
A Survey on Mixture of Experts	Jun 26, 2024	In-Context LearningMixture-of-Experts	CodeCode Available	3
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward Model	Jun 2, 2024	DenoisingMixture-of-Experts	CodeCode Available	3

Show:10 25 50

← PrevPage 2 of 53Next →

No leaderboard results yet.