SOTAVerified

Mixture-of-Experts

Papers

Showing 251300 of 1312 papers

TitleStatusHype
Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining0
A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery0
Question-Aware Gaussian Experts for Audio-Visual Question AnsweringCode1
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling0
BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification0
VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology DetectionCode0
Convergence Rates for Softmax Gating Mixture of Experts0
Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMsCode1
Tabby: Tabular Data Synthesis with Language Models0
MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font GenerationCode1
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed TransformerCode0
How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model0
Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting0
ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition0
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models0
PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation0
Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
CoSMoEs: Compact Sparse Mixture of Experts0
Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems0
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-ExpertsCode1
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook0
Comet: Fine-grained Computation-communication Overlapping for Mixture-of-ExpertsCode5
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks0
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization0
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment0
Delta Decompression for MoE-based LLMs CompressionCode2
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE0
ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds0
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference0
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks0
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization AlignmentCode2
An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning0
Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts ModelsCode0
Tight Clusters Make Specialized ExpertsCode0
Ray-Tracing for Conditionally Activated Neural Networks0
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts0
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs0
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models0
MoBA: Mixture of Block Attention for Long-Context LLMsCode7
Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer GateCode0
Connector-S: A Survey of Connectors in Multi-modal Large Language Models0
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines0
ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models0
Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time0
Probing Semantic Routing in Large Mixture-of-Expert Models0
Eidetic Learning: an Efficient and Provable Solution to Catastrophic ForgettingCode0
Heterogeneous Mixture of Experts for Remote Sensing Image Super-ResolutionCode1
Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification0
Show:102550
← PrevPage 6 of 27Next →

No leaderboard results yet.