SOTAVerified

Mixture-of-Experts

Papers

Showing 150 of 1312 papers

TitleStatusHype
DeepSeek-V3 Technical ReportCode16
Qwen2.5 Technical ReportCode13
Qwen2 Technical ReportCode13
A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and ApplicationsCode9
DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal UnderstandingCode9
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code IntelligenceCode9
Turbo Sparse: Achieving LLM SOTA Performance with Minimal Activated ParametersCode9
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language ModelCode9
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning AttentionCode7
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion TransformerCode7
MoBA: Mixture of Block Attention for Long-Context LLMsCode7
MiniMax-01: Scaling Foundation Models with Lightning AttentionCode7
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM TrainingCode7
MoE-LLaVA: Mixture of Experts for Large Vision-Language ModelsCode7
Kimi-VL Technical ReportCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentCode5
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of ExpertsCode5
Aria: An Open Multimodal Native Mixture-of-Experts ModelCode5
Jamba-1.5: Hybrid Transformer-Mamba Models at ScaleCode5
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-BudgetCode5
LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-trainingCode5
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-ExpertsCode5
Parrot: Multilingual Visual Instruction TuningCode5
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of ExpertsCode5
Rethinking LLM Language Adaptation: A Case Study on Chinese MixtralCode5
OpenMoE: An Early Effort on Open Mixture-of-Experts Language ModelsCode5
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language ModelsCode5
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language ModelCode5
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-FreeCode4
Training Sparse Mixture Of Experts Text Embedding ModelsCode4
MoH: Multi-Head Attention as Mixture-of-Head AttentionCode4
MoE++: Accelerating Mixture-of-Experts Methods with Zero-Computation ExpertsCode4
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of ExpertsCode4
OLMoE: Open Mixture-of-Experts Language ModelsCode4
Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language ModelsCode4
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language ModelsCode4
JetMoE: Reaching Llama2 Performance with 0.1M DollarsCode4
Mixtral of ExpertsCode4
Fast Inference of Mixture-of-Experts Language Models with OffloadingCode4
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented ScaleCode4
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance FieldsCode3
A Survey on Inference Optimization Techniques for Mixture of Experts ModelsCode3
Generalizing Motion Planners with Mixture of Experts for Autonomous DrivingCode3
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge DistillationCode3
AnyGraph: Graph Foundation Model in the WildCode3
YourMT3+: Multi-instrument Music Transcription with Enhanced Transformer Architectures and Cross-dataset Stem AugmentationCode3
A Survey on Mixture of ExpertsCode3
Reservoir History Matching of the Norne field with generative exotic priors and a coupled Mixture of Experts -- Physics Informed Neural Operator Forward ModelCode3
Show:102550
← PrevPage 1 of 27Next →

No leaderboard results yet.