Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 101–150 of 1312 papers

Title	Date	Tasks	Status	Hype
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language Models	Apr 16, 2024	image-classificationImage Classification	CodeCode Available	2
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery Detection	Apr 12, 2024	Mixture-of-Experts	CodeCode Available	2
Multi-Task Dense Prediction via Mixture of Low-Rank Experts	Mar 26, 2024	DecoderMixture-of-Experts	CodeCode Available	2
Task-Customized Mixture of Adapters for General Image Fusion	Mar 19, 2024	Mixture-of-Experts	CodeCode Available	2
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation	Mar 18, 2024	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	2
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts	Mar 14, 2024	DenoisingMixture-of-Experts	CodeCode Available	2
Scattered Mixture-of-Experts Implementation	Mar 13, 2024	Mixture-of-Experts	CodeCode Available	2
Harder Tasks Need More Experts: Dynamic Routing in MoE Models	Mar 12, 2024	Computational EfficiencyMixture-of-Experts	CodeCode Available	2
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts	Mar 5, 2024	Graph AttentionGraph Embedding	CodeCode Available	2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models	Feb 22, 2024	AllMixture-of-Experts	CodeCode Available	2
Higher Layers Need More LoRA Experts	Feb 13, 2024	Mixture-of-Experts	CodeCode Available	2
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General Tasks	Jan 5, 2024	Arithmetic ReasoningCode Generation	CodeCode Available	2
Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-Tuning	Dec 22, 2023	Instruction FollowingMixture-of-Experts	CodeCode Available	2
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style Plugin	Dec 15, 2023	Language ModellingMixture-of-Experts	CodeCode Available	2
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models	Oct 25, 2023	GPUMixture-of-Experts	CodeCode Available	2
Mixture of Tokens: Continuous MoE through Cross-Example Aggregation	Oct 24, 2023	Language ModellingLarge Language Model	CodeCode Available	2
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning	Sep 11, 2023	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	2
Fast Feedforward Networks	Aug 28, 2023	Mixture-of-Experts	CodeCode Available	2
Motion In-Betweening with Phase Manifolds	Aug 24, 2023	Mixture-of-Expertsmotion in-betweening	CodeCode Available	2
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts	Jul 28, 2023	Long-range modelingMixture-of-Experts	CodeCode Available	2
ModuleFormer: Modularity Emerges from Mixture-of-Experts	Jun 7, 2023	Language ModellingLightweight Deployment	CodeCode Available	2
Learning A Sparse Transformer Network for Effective Image Deraining	Mar 21, 2023	Image ReconstructionImage Restoration	CodeCode Available	2
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints	Dec 9, 2022	Mixture-of-Experts	CodeCode Available	2
No Language Left Behind: Scaling Human-Centered Machine Translation	Jul 11, 2022	Machine TranslationMixture-of-Experts	CodeCode Available	2
Towards Universal Sequence Representation Learning for Recommender Systems	Jun 13, 2022	Mixture-of-ExpertsRecommendation Systems	CodeCode Available	2
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs	Jun 9, 2022	Image CaptioningImage Classification	CodeCode Available	2
Tutel: Adaptive Mixture-of-Experts at Scale	Jun 7, 2022	Mixture-of-ExpertsObject Detection	CodeCode Available	2
Text2Human: Text-Driven Controllable Human Image Generation	May 31, 2022	DiversityHuman Parsing	CodeCode Available	2
MDFEND: Multi-domain Fake News Detection	Jan 4, 2022	Fake News DetectionMixture-of-Experts	CodeCode Available	2
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity	Jan 11, 2021	Language ModellingMixture-of-Experts	CodeCode Available	2
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer	Jan 23, 2017	Computational EfficiencyGPU	CodeCode Available	2
Learning to Skip the Middle Layers of Transformers	Jun 26, 2025	Mixture-of-Experts	CodeCode Available	1
Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution	Jun 13, 2025	Image Super-ResolutionMixture-of-Experts	CodeCode Available	1
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation Model	Jun 2, 2025	Mixture-of-ExpertsUnsupervised Pre-training	CodeCode Available	1
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision Transformer	May 30, 2025	Mixture-of-Experts	CodeCode Available	1
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language Models	May 26, 2025	Mixture-of-Experts	CodeCode Available	1
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank Adaptation	May 24, 2025	Mixture-of-Experts	CodeCode Available	1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model	May 22, 2025	GPULong-range modeling	CodeCode Available	1
U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding	May 20, 2025	cross-modal alignmentLanguage Modeling	CodeCode Available	1
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference	May 19, 2025	Computational EfficiencyMixture-of-Experts	CodeCode Available	1
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding	May 10, 2025	DescriptiveEmotion Recognition	CodeCode Available	1
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design	May 9, 2025	Mixture-of-ExpertsQuantization	CodeCode Available	1
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing	May 1, 2025	Mixture-of-Experts	CodeCode Available	1
Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identification	Apr 21, 2025	Exemplar-FreeKnowledge Distillation	CodeCode Available	1
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated Images	Apr 21, 2025	Mixture-of-Experts	CodeCode Available	1
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts	Apr 16, 2025	Mixture-of-Experts	CodeCode Available	1
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing	Apr 10, 2025	In-Context LearningMixture-of-Experts	CodeCode Available	1
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-Resolution	Apr 9, 2025	Computational EfficiencyDenoising	CodeCode Available	1
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators	Apr 3, 2025	Mixture-of-ExpertsQuantization	CodeCode Available	1
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual Tracking	Mar 24, 2025	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	1

Show:10 25 50

← PrevPage 3 of 27Next →

No leaderboard results yet.