SOTAVerified

Mixture-of-Experts

Papers

Showing 451500 of 1312 papers

TitleStatusHype
Granger-causal Attentive Mixtures of Experts: Learning Important Features with Neural NetworksCode0
Graph Knowledge Distillation to Mixture of ExpertsCode0
Elucidating Robust Learning with Uncertainty-Aware Corruption Pattern EstimationCode0
Mixture of Link Predictors on GraphsCode0
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-ExpertsCode0
Eliciting and Understanding Cross-Task Skills with Task-Level Mixture-of-ExpertsCode0
Eidetic Learning: an Efficient and Provable Solution to Catastrophic ForgettingCode0
Mixture-of-Experts Graph Transformers for Interpretable Particle Collision DetectionCode0
Adversarial Mixture Of Experts with Category Hierarchy Soft ConstraintCode0
Guiding the Experts: Semantic Priors for Efficient and Focused MoE RoutingCode0
Mixture Content Selection for Diverse Sequence GenerationCode0
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node ClassificationCode0
Online Action Recognition for Human Risk Prediction with Anticipated Haptic Alert via WearablesCode0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
MaskMoE: Boosting Token-Level Learning via Routing Mask in Mixture-of-ExpertsCode0
Efficient and Interpretable Grammatical Error Correction with Mixture of ExpertsCode0
Effective Approaches to Batch Parallelization for Dynamic Neural Network ArchitecturesCode0
Manifold-Preserving Transformers are Effective for Short-Long Range EncodingCode0
m2mKD: Module-to-Module Knowledge Distillation for Modular TransformersCode0
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware OptimizationCode0
A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLPCode0
DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing MechanismCode0
Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts ModelsCode0
A Multi-Modal Deep Learning Framework for Pan-Cancer PrognosisCode0
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time EstimationCode0
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-SpoofingCode0
Hierarchical Mixtures of Generators for Adversarial LearningCode0
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?Code0
Bidirectional Attention as a Mixture of Continuous Word ExpertsCode0
DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task LearningCode0
Lifelong Mixture of Variational AutoencodersCode0
Learning to Adapt Clinical Sequences with Residual Mixture of ExpertsCode0
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly DetectionCode0
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing PlatformCode0
Learning Mixture-of-Experts for General-Purpose Black-Box Discrete OptimizationCode0
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectivesCode0
Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product NetworksCode0
Learning CHARME models with neural networksCode0
Learning Gating ConvNet for Two-Stream based Methods in Action RecognitionCode0
Learning a Mixture of Granularity-Specific Experts for Fine-Grained CategorizationCode0
Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer GateCode0
k-Winners-Take-All Ensemble Neural NetworkCode0
Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-ExpertsCode0
A Mixture-of-Experts Model for Learning Multi-Facet Entity EmbeddingsCode0
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch PipelineCode0
Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic PerspectiveCode0
A Mixture-of-Experts Model for Antonym-Synonym DiscriminationCode0
Discontinuity-Sensitive Optimal Control Learning by Mixture of ExpertsCode0
Intrinsic User-Centric Interpretability through Global Mixture of ExpertsCode0
Show:102550
← PrevPage 10 of 27Next →

No leaderboard results yet.