SOTAVerified

Mixture-of-Experts

Papers

Showing 351375 of 1312 papers

TitleStatusHype
GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection VectorsCode0
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models0
Single-Example Learning in a Mixture of GPDMs with Latent Geometries0
Load Balancing Mixture of Experts with Similarity Preserving Routers0
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware OptimizationCode0
Serving Large Language Models on Huawei CloudMatrix3840
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts0
GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture0
MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding0
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling0
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing0
M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration0
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert SpecializationCode0
MIRA: Medical Time Series Foundation Model for Real-World Health Data0
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data ImputationCode0
Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning0
SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities0
Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection0
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts0
Enhancing Multimodal Continual Instruction Tuning with BranchLoRA0
Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis0
GradPower: Powering Gradients for Faster Language Model Pre-Training0
On the Expressive Power of Mixture-of-Experts for Structured Complex Tasks0
Mixture-of-Experts for Personalized and Semantic-Aware Next Location Prediction0
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering AgentsCode0
Show:102550
← PrevPage 15 of 53Next →

No leaderboard results yet.