SOTAVerified

Mixture-of-Experts

Papers

Showing 176200 of 1312 papers

TitleStatusHype
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-designCode1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-ExpertsCode1
PAD-Net: An Efficient Framework for Dynamic NetworksCode1
Learning to Skip the Middle Layers of TransformersCode1
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-ExpertsCode1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Layerwise Recurrent Router for Mixture-of-ExpertsCode1
M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image AnalysisCode1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation ModelCode1
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM InferenceCode1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze DatasetCode1
RetGen: A Joint framework for Retrieval and Grounded Text Generation ModelingCode1
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoECode1
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of ExpertsCode1
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder ApproachCode1
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language ModelsCode1
Making Neural Networks Interpretable with Attribution: Application to Implicit Signals PredictionCode1
Addressing Confounding Feature Issue for Causal RecommendationCode1
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-MixingCode1
HyperMoE: Towards Better Mixture of Experts via Transferring Among ExpertsCode1
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision TransformerCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Heterogeneous Multi-task Learning with Expert DiversityCode1
Show:102550
← PrevPage 8 of 53Next →

No leaderboard results yet.