SOTAVerified

Mixture-of-Experts

Papers

Showing 12011250 of 1312 papers

TitleStatusHype
Self-Routing Capsule NetworksCode0
ASEM: Enhancing Empathy in Chatbot through Attention-based Sentiment and Emotion ModelingCode0
Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts ModelsCode0
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE InferenceCode0
DA-MoE: Addressing Depth-Sensitivity in Graph-Level Analysis through Mixture of ExpertsCode0
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed TransformerCode0
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen SubstrateCode0
Sequential Gaussian Processes for Online Learning of Nonstationary FunctionsCode0
Self-Supervised Multimodal Domino: in Search of Biomarkers for Alzheimer's DiseaseCode0
OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-ExpresserCode0
Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node ClassificationCode0
Video Relationship Detection Using Mixture of ExpertsCode0
Graph Knowledge Distillation to Mixture of ExpertsCode0
Tensor-variate Mixture of Experts for Proportional Myographic Control of a Robotic HandCode0
Mixture-of-Experts Graph Transformers for Interpretable Particle Collision DetectionCode0
Granger-causal Attentive Mixtures of Experts: Learning Important Features with Neural NetworksCode0
Adversarial Mixture Of Experts with Category Hierarchy Soft ConstraintCode0
A non-asymptotic approach for model selection via penalization in high-dimensional mixture of experts modelsCode0
Covariate-guided Bayesian mixture model for multivariate time seriesCode0
Mixture Content Selection for Diverse Sequence GenerationCode0
Countering Mainstream Bias via End-to-End Adaptive Local LearningCode0
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of ExpertsCode0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-ExpertsCode0
Peirce in the Machine: How Mixture of Experts Models Perform Hypothesis ConstructionCode0
Condensing Multilingual Knowledge with Lightweight Language-Specific ModulesCode0
Completed Feature Disentanglement Learning for Multimodal MRIs AnalysisCode0
Skeleton-Based Human Action Recognition with Noisy LabelsCode0
UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper GranularityCode0
Manifold-Preserving Transformers are Effective for Short-Long Range EncodingCode0
GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous DrivingCode0
FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion ModelsCode0
From Knowledge to Noise: CTIM-Rover and the Pitfalls of Episodic Memory in Software Engineering AgentsCode0
A Gaussian Process-based Streaming Algorithm for Prediction of Time Series With Regimes and OutliersCode0
Anomaly Detection by Recombining Gated Unsupervised ExpertsCode0
SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control TasksCode0
Finger Pose Estimation for Under-screen Fingerprint SensorCode0
pFedMoE: Data-Level Personalization with Mixture of Experts for Model-Heterogeneous Personalized Federated LearningCode0
FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation ModelsCode0
m2mKD: Module-to-Module Knowledge Distillation for Modular TransformersCode0
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-SpoofingCode0
Fast filtering of non-Gaussian models using Amortized Optimal Transport MapsCode0
A Gated Residual Kolmogorov-Arnold Networks for Mixtures of ExpertsCode0
Bidirectional Attention as a Mixture of Continuous Word ExpertsCode0
Universal Simultaneous Machine Translation with Mixture-of-Experts Wait-k PolicyCode0
Tight Clusters Make Specialized ExpertsCode0
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via CompetitionCode0
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-BackdoorsCode0
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?Code0
FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language ModelsCode0
Show:102550
← PrevPage 25 of 27Next →

No leaderboard results yet.