SOTAVerified

Mixture-of-Experts

Papers

Showing 151200 of 1312 papers

TitleStatusHype
StableFusion: Continual Video Retrieval via Frame AdaptationCode1
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor CoresCode1
Question-Aware Gaussian Experts for Audio-Visual Question AnsweringCode1
Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMsCode1
MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font GenerationCode1
R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-ExpertsCode1
ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action ModelCode1
Heterogeneous Mixture of Experts for Remote Sensing Image Super-ResolutionCode1
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoECode1
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM InferenceCode1
UniGraph2: Learning a Unified Embedding Space to Bind Multimodal GraphsCode1
PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated LearningCode1
FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of ExpertsCode1
Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential RecommendationCode1
MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing NetworksCode1
Modality Interactive Mixture-of-Experts for Fake News DetectionCode1
Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous LearningCode1
BrainMAP: Learning Multiple Activation Pathways in Brain NetworksCode1
MedCoT: Medical Chain of Thought via Hierarchical ExpertCode1
Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model ArchitectureCode1
SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of ExpertsCode1
RSUniVLM: A Unified Vision Language Model for Remote Sensing via Granularity-oriented Mixture of ExpertsCode1
Condense, Don't Just Prune: Enhancing Efficiency and Performance in MoE Layer PruningCode1
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of ExpertsCode1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality ReductionCode1
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-DesignCode1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze DatasetCode1
ST-MoE-BERT: A Spatial-Temporal Mixture-of-Experts Framework for Long-Term Cross-City Mobility PredictionCode1
MomentumSMoE: Integrating Momentum into Sparse Mixture of ExpertsCode1
GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable RecommendationCode1
AlphaLoRA: Assigning LoRA Experts Based on Layer Training QualityCode1
Mixture of Experts Made Personalized: Federated Prompt Learning for Vision-Language ModelsCode1
Retraining-Free Merging of Sparse MoE via Hierarchical ClusteringCode1
Efficient Dictionary Learning with Switch Sparse AutoencodersCode1
Model-GLUE: Democratized LLM Scaling for A Large Model Zoo in the WildCode1
Searching for Efficient Linear Layers over a Continuous Space of Structured MatricesCode1
A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow PredictionCode1
Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoECode1
LOLA -- An Open-Source Massively Multilingual Large Language ModelCode1
M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA frameworkCode1
Gradient-free variational learning with conditional mixture networksCode1
Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic ForecastingCode1
AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE InferenceCode1
Customizing Language Models with Instance-wise LoRA for Sequential RecommendationCode1
Layerwise Recurrent Router for Mixture-of-ExpertsCode1
AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out StrategiesCode1
MoExtend: Tuning New Experts for Modality and Task ExtensionCode1
Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical RoutingCode1
M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image AnalysisCode1
Show:102550
← PrevPage 4 of 27Next →

No leaderboard results yet.