SOTAVerified

Mixture-of-Experts

Papers

Showing 251275 of 1312 papers

TitleStatusHype
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining0
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery0
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling0
Question-Aware Gaussian Experts for Audio-Visual Question AnsweringCode1
BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification0
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology DetectionCode0
Convergence Rates for Softmax Gating Mixture of Experts0
Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMsCode1
Tabby: Tabular Data Synthesis with Language Models0
MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font GenerationCode1
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed TransformerCode0
How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model0
Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting0
ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition0
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models0
PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation0
Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
CoSMoEs: Compact Sparse Mixture of Experts0
Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems0
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook0
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-ExpertsCode1
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks0
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization0
Show:102550
← PrevPage 11 of 53Next →

No leaderboard results yet.