Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 176–200 of 1312 papers

Title	Date	Tasks	Status	Hype
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design	Oct 26, 2022	Mixture-of-ExpertsMulti-Task Learning	CodeCode Available	1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language Models	Nov 1, 2024	BenchmarkingMixture-of-Experts	CodeCode Available	1
Lifting the Curse of Capacity Gap in Distilling Language Models	May 20, 2023	Knowledge DistillationMixture-of-Experts	CodeCode Available	1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts	Jul 24, 2022	Deep Reinforcement LearningHumanoid Control	CodeCode Available	1
PAD-Net: An Efficient Framework for Dynamic Networks	Nov 10, 2022	image-classificationImage Classification	CodeCode Available	1
Learning to Skip the Middle Layers of Transformers	Jun 26, 2025	Mixture-of-Experts	CodeCode Available	1
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts	Feb 10, 2020	Language ModellingMixture-of-Experts	CodeCode Available	1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model	Feb 20, 2025	Mixture-of-ExpertsQuestion Answering	CodeCode Available	1
Layerwise Recurrent Router for Mixture-of-Experts	Aug 13, 2024	AttributeMixture-of-Experts	CodeCode Available	1
M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image Analysis	Jul 24, 2024	Mixture-of-ExpertsMultiple Instance Learning	CodeCode Available	1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model	May 22, 2025	GPULong-range modeling	CodeCode Available	1
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference	Feb 6, 2025	Mixture-of-Experts	CodeCode Available	1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset	Oct 21, 2024	Image DehazingMamba	CodeCode Available	1
RetGen: A Joint framework for Retrieval and Grounded Text Generation Modeling	May 14, 2021	Dialogue GenerationLanguage Modeling	CodeCode Available	1
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoE	Feb 10, 2025	DiversityLanguage Modeling	CodeCode Available	1
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts	Dec 12, 2023	Mixture-of-Experts	CodeCode Available	1
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach	Oct 18, 2023	Blind Super-ResolutionDecoder	CodeCode Available	1
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models	Jun 19, 2024	ARCMixture-of-Experts	CodeCode Available	1
Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction	Aug 26, 2020	Interpretable Machine LearningMixture-of-Experts	CodeCode Available	1
Addressing Confounding Feature Issue for Causal Recommendation	May 13, 2022	Mixture-of-ExpertsRecommendation Systems	CodeCode Available	1
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Apr 10, 2025	In-Context LearningMixture-of-Experts	CodeCode Available	1
HyperMoE: Towards Better Mixture of Experts via Transferring Among Experts	Feb 20, 2024	Mixture-of-ExpertsMulti-Task Learning	CodeCode Available	1
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer	May 30, 2025	Mixture-of-Experts	CodeCode Available	1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss	Sep 9, 2021	Mixture-of-ExpertsRetrieval	CodeCode Available	1
Heterogeneous Multi-task Learning with Expert Diversity	Jun 20, 2021	DiversityMixture-of-Experts	CodeCode Available	1

Show:10 25 50

← PrevPage 8 of 53Next →

No leaderboard results yet.