SOTAVerified

Mixture-of-Experts

Papers

Showing 451475 of 1312 papers

TitleStatusHype
Effective Approaches to Batch Parallelization for Dynamic Neural Network ArchitecturesCode0
m2mKD: Module-to-Module Knowledge Distillation for Modular TransformersCode0
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware OptimizationCode0
A multi-scale lithium-ion battery capacity prediction using mixture of experts and patch-based MLPCode0
DynMoLE: Boosting Mixture of LoRA Experts Fine-Tuning with a Hybrid Routing MechanismCode0
Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts ModelsCode0
A Multi-Modal Deep Learning Framework for Pan-Cancer PrognosisCode0
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?Code0
GShard: Scaling Giant Models with Conditional Computation and Automatic ShardingCode0
Guiding the Experts: Semantic Priors for Efficient and Focused MoE RoutingCode0
GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection VectorsCode0
GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace TheoryCode0
DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time EstimationCode0
BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-SpoofingCode0
Bidirectional Attention as a Mixture of Continuous Word ExpertsCode0
DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task LearningCode0
Lifelong Mixture of Variational AutoencodersCode0
Learning Mixture-of-Experts for General-Purpose Black-Box Discrete OptimizationCode0
Learning Deep Mixtures of Gaussian Process Experts Using Sum-Product NetworksCode0
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly DetectionCode0
Learning Gating ConvNet for Two-Stream based Methods in Action RecognitionCode0
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectivesCode0
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing PlatformCode0
Hierarchical Deep Recurrent Architecture for Video UnderstandingCode0
Learning a Mixture of Granularity-Specific Experts for Fine-Grained CategorizationCode0
Show:102550
← PrevPage 19 of 53Next →

No leaderboard results yet.