SOTAVerified

Mixture-of-Experts

Papers

Showing 101150 of 1312 papers

TitleStatusHype
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language ModelsCode2
MoE-FFD: Mixture of Experts for Generalized and Parameter-Efficient Face Forgery DetectionCode2
Multi-Task Dense Prediction via Mixture of Low-Rank ExpertsCode2
Task-Customized Mixture of Adapters for General Image FusionCode2
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationCode2
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-ExpertsCode2
Scattered Mixture-of-Experts ImplementationCode2
Harder Tasks Need More Experts: Dynamic Routing in MoE ModelsCode2
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of ExpertsCode2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language ModelsCode2
Higher Layers Need More LoRA ExpertsCode2
Parameter-Efficient Sparsity Crafting from Dense to Mixture-of-Experts for Instruction Tuning on General TasksCode2
Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-TuningCode2
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style PluginCode2
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter ModelsCode2
Mixture of Tokens: Continuous MoE through Cross-Example AggregationCode2
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction TuningCode2
Fast Feedforward NetworksCode2
Motion In-Betweening with Phase ManifoldsCode2
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-ExpertsCode2
ModuleFormer: Modularity Emerges from Mixture-of-ExpertsCode2
Learning A Sparse Transformer Network for Effective Image DerainingCode2
Sparse Upcycling: Training Mixture-of-Experts from Dense CheckpointsCode2
No Language Left Behind: Scaling Human-Centered Machine TranslationCode2
Towards Universal Sequence Representation Learning for Recommender SystemsCode2
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEsCode2
Tutel: Adaptive Mixture-of-Experts at ScaleCode2
Text2Human: Text-Driven Controllable Human Image GenerationCode2
MDFEND: Multi-domain Fake News DetectionCode2
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient SparsityCode2
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts LayerCode2
Learning to Skip the Middle Layers of TransformersCode1
Structural Similarity-Inspired Unfolding for Lightweight Image Super-ResolutionCode1
SPACE: Your Genomic Profile Predictor is a Powerful DNA Foundation ModelCode1
Mastering Massive Multi-Task Reinforcement Learning via Mixture-of-Expert Decision TransformerCode1
FLAME-MoE: A Transparent End-to-End Research Platform for Mixture-of-Experts Language ModelsCode1
ThanoRA: Task Heterogeneity-Aware Multi-Task Low-Rank AdaptationCode1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation ModelCode1
U-SAM: An audio language Model for Unified Speech, Audio, and Music UnderstandingCode1
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and InferenceCode1
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language UnderstandingCode1
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-DesignCode1
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice RoutingCode1
Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identificationCode1
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated ImagesCode1
Dense Backpropagation Improves Training for Sparse Mixture-of-ExpertsCode1
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-MixingCode1
MoEDiff-SR: Mixture of Experts-Guided Diffusion Model for Region-Adaptive MRI Super-ResolutionCode1
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank CompensatorsCode1
SPMTrack: Spatio-Temporal Parameter-Efficient Fine-Tuning with Mixture of Experts for Scalable Visual TrackingCode1
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.