SOTAVerified

Mixture-of-Experts

Papers

Showing 251300 of 1312 papers

TitleStatusHype
LOLA -- An Open-Source Massively Multilingual Large Language ModelCode1
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge ExcavationCode1
LLMBind: A Unified Modality-Task Integration FrameworkCode1
Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical RoutingCode1
BiMediX: Bilingual Medical Mixture of Experts LLMCode1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language ModelsCode1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference CostsCode1
LIBMoE: A Library for comprehensive benchmarking Mixture of Experts in Large Language ModelsCode1
Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of AdaptersCode1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-ExpertsCode1
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation FrameworkCode1
MEFT: Memory-Efficient Fine-Tuning through Sparse AdapterCode1
Layerwise Recurrent Router for Mixture-of-ExpertsCode1
RetGen: A Joint framework for Retrieval and Grounded Text Generation ModelingCode1
DMT-HI: MOE-based Hyperbolic Interpretable Deep Manifold Transformation for Unspervised Dimensionality ReductionCode1
DMoERM: Recipes of Mixture-of-Experts for Effective Reward ModelingCode1
Efficient and Degradation-Adaptive Network for Real-World Image Super-ResolutionCode1
JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation ModelCode1
HyperMoE: Towards Better Mixture of Experts via Transferring Among ExpertsCode1
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder ModelsCode1
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph CompletionCode1
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder ApproachCode1
Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential RecommendationCode1
Heterogeneous Mixture of Experts for Remote Sensing Image Super-ResolutionCode1
Distribution-aware Forgetting Compensation for Exemplar-Free Lifelong Person Re-identificationCode1
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of ExpertsCode1
Heterogeneous Multi-task Learning with Expert DiversityCode1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax LossCode1
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-ExpertsCode1
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoECode1
Go Wider Instead of DeeperCode1
Distilling the Knowledge in a Neural NetworkCode1
Gradient-free variational learning with conditional mixture networksCode1
Large Multi-modality Model Assisted AI-Generated Image Quality AssessmentCode1
GaVaMoE: Gaussian-Variational Gated Mixture of Experts for Explainable RecommendationCode1
Awaker2.5-VL: Stably Scaling MLLMs with Parameter-Efficient Mixture of ExpertsCode1
Learning to Skip the Middle Layers of TransformersCode1
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned ExpertsCode1
Frequency-Adaptive Pan-Sharpening with Mixture of ExpertsCode1
FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of ExpertsCode1
Efficient Dictionary Learning with Switch Sparse AutoencodersCode1
Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-ExpertsCode1
DirectMultiStep: Direct Route Generation for Multi-Step RetrosynthesisCode1
Gated Multimodal Units for Information FusionCode1
Graph Sparsification via Mixture of GraphsCode1
Few-Shot and Continual Learning with Attentive Independent MechanismsCode1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze DatasetCode1
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine TranslationCode1
Show:102550
← PrevPage 6 of 27Next →

No leaderboard results yet.