SOTAVerified

Mixture-of-Experts

Papers

Showing 201250 of 1312 papers

TitleStatusHype
Norface: Improving Facial Expression Analysis by Identity NormalizationCode1
Swin SMT: Global Sequential Modeling in 3D Medical Image SegmentationCode1
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference CostsCode1
Solving Token Gradient Conflict in Mixture-of-Experts for Large Vision-Language ModelCode1
AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language ModelsCode1
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-ExpertsCode1
MoE-RBench: Towards Building Reliable Language Models with Sparse Mixture-of-ExpertsCode1
Towards Efficient Pareto Set Approximation via Mixture of Experts Based Model FusionCode1
DeepUnifiedMom: Unified Time-series Momentum Portfolio Construction via Multi-Task Learning with Multi-Gate Mixture of ExpertsCode1
Examining Post-Training Quantization for Mixture-of-Experts: A BenchmarkCode1
MEFT: Memory-Efficient Fine-Tuning through Sparse AdapterCode1
Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf NodeCode1
Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-ContrastCode1
Graph Sparsification via Mixture of GraphsCode1
Mixture of Experts Meets Prompt-Based Continual LearningCode1
DirectMultiStep: Direct Route Generation for Multi-Step RetrosynthesisCode1
MeteoRA: Multiple-tasks Embedded LoRA for Large Language ModelsCode1
M^4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of ExpertsCode1
EWMoE: An effective model for global weather forecasting with mixture-of-expertsCode1
Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and MethodCode1
M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation FrameworkCode1
Swin2-MoSE: A New Single Image Super-Resolution Model for Remote SensingCode1
Large Multi-modality Model Assisted AI-Generated Image Quality AssessmentCode1
XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-ExpertsCode1
Multi-Head Mixture-of-ExpertsCode1
Prompt-prompted Adaptive Structured Pruning for Efficient LLM GenerationCode1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated ExpertsCode1
Unity by Diversity: Improved Representation Learning in Multimodal VAEsCode1
DMoERM: Recipes of Mixture-of-Experts for Effective Reward ModelingCode1
Sequence-level Semantic Representation Fusion for Recommender SystemsCode1
XMoE: Sparse Models with Fine-grained and Adaptive Expert SelectionCode1
LLMBind: A Unified Modality-Task Integration FrameworkCode1
HyperMoE: Towards Better Mixture of Experts via Transferring Among ExpertsCode1
Scaling physics-informed hard constraints with mixture-of-expertsCode1
BiMediX: Bilingual Medical Mixture of Experts LLMCode1
Multilinear Mixture of Experts: Scalable Expert Specialization through FactorizationCode1
Multimodal Clinical Trial Outcome Prediction with Large Language ModelsCode1
Merging Multi-Task Models via Weight-Ensembling Mixture of ExpertsCode1
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of AdaptersCode1
Contrastive Learning and Mixture of Experts Enables Precise Vector EmbeddingsCode1
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model InferenceCode1
Frequency-Adaptive Pan-Sharpening with Mixture of ExpertsCode1
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and EditingCode1
When Parameter-efficient Tuning Meets General-purpose Vision-language ModelsCode1
SwitchHead: Accelerating Transformers with Mixture-of-Experts AttentionCode1
Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-ExpertsCode1
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of ExpertsCode1
Mixture-of-Linear-Experts for Long-term Time Series ForecastingCode1
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned ExpertsCode1
Show:102550
← PrevPage 5 of 27Next →

No leaderboard results yet.