SOTAVerified

Mixture-of-Experts

Papers

Showing 151200 of 1312 papers

TitleStatusHype
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE InferenceCode1
Mixture of Experts Meets Prompt-Based Continual LearningCode1
MeteoRA: Multiple-tasks Embedded LoRA for Large Language ModelsCode1
MiCE: Mixture of Contrastive Experts for Unsupervised Image ClusteringCode1
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-ExpertsCode1
Merging Multi-Task Models via Weight-Ensembling Mixture of ExpertsCode1
Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identificationCode1
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank CompensatorsCode1
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out StrategiesCode1
MEFT: Memory-Efficient Fine-Tuning through Sparse AdapterCode1
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyCode1
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language ModelsCode1
MedCoT: Medical Chain of Thought via Hierarchical ExpertCode1
Merging Experts into One: Improving Computational Efficiency of Mixture of ExpertsCode1
Mixture-of-Linear-Experts for Long-term Time Series ForecastingCode1
M^4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of ExpertsCode1
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM InferenceCode1
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-designCode1
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge ExcavationCode1
LLMBind: A Unified Modality-Task Integration FrameworkCode1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language ModelsCode1
Making Neural Networks Interpretable with Attribution: Application to Implicit Signals PredictionCode1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
PAD-Net: An Efficient Framework for Dynamic NetworksCode1
Learning to Skip the Middle Layers of TransformersCode1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-ExpertsCode1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Cross-Domain Label-Adaptive Stance DetectionCode1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
Manifold Induced Biases for Zero-shot and Few-shot Detection of Generated ImagesCode1
Layerwise Recurrent Router for Mixture-of-ExpertsCode1
Large Multi-modality Model Assisted AI-Generated Image Quality AssessmentCode1
Contrastive Learning and Mixture of Experts Enables Precise Vector EmbeddingsCode1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze DatasetCode1
LOLA -- An Open-Source Massively Multilingual Large Language ModelCode1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation ModelCode1
M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation FrameworkCode1
Addressing Confounding Feature Issue for Causal RecommendationCode1
M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image AnalysisCode1
C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-MixingCode1
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoECode1
RetGen: A Joint framework for Retrieval and Grounded Text Generation ModelingCode1
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-ExpertsCode1
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of ExpertsCode1
COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local SearchCode1
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder ApproachCode1
Customizing Language Models with Instance-wise LoRA for Sequential RecommendationCode1
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasetsCode1
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph CompletionCode1
Show:102550
← PrevPage 4 of 27Next →

No leaderboard results yet.