SOTAVerified

Mixture-of-Experts

Papers

Showing 1120 of 1312 papers

TitleStatusHype
MoBA: Mixture of Block Attention for Long-Context LLMsCode7
MiniMax-01: Scaling Foundation Models with Lightning AttentionCode7
Revisiting MoE and Dense Speed-Accuracy Comparisons for LLM TrainingCode7
MoE-LLaVA: Mixture of Experts for Large Vision-Language ModelsCode7
Kimi-VL Technical ReportCode5
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by TencentCode5
Moirai-MoE: Empowering Time Series Foundation Models with Sparse Mixture of ExpertsCode5
Aria: An Open Multimodal Native Mixture-of-Experts ModelCode5
Jamba-1.5: Hybrid Transformer-Mamba Models at ScaleCode5
Show:102550
← PrevPage 2 of 132Next →

No leaderboard results yet.