SOTAVerified

Mixture-of-Experts

Papers

Showing 751800 of 1312 papers

TitleStatusHype
Identifying Shopping Intent in Product QA for Proactive Recommendations0
Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models0
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts0
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts0
Half-Space Feature Learning in Neural Networks0
Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-BackdoorsCode0
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
Prompt-prompted Adaptive Structured Pruning for Efficient LLM GenerationCode1
Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity0
Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks0
Jamba: A Hybrid Transformer-Mamba Language ModelCode0
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study0
Multi-Task Dense Prediction via Mixture of Low-Rank ExpertsCode2
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot0
DESIRE-ME: Domain-Enhanced Supervised Information REtrieval using Mixture-of-ExpertsCode0
Task-Customized Mixture of Adapters for General Image FusionCode2
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationCode2
Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts AdaptersCode3
Skeleton-Based Human Action Recognition with Noisy LabelsCode0
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-ExpertsCode2
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training0
Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated ExpertsCode1
Scattered Mixture-of-Experts ImplementationCode2
Conditional computation in neural networks: principles and research trends0
Harder Tasks Need More Experts: Dynamic Routing in MoE ModelsCode2
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM0
Equipping Computational Pathology Systems with Artifact Processing Pipelines: A Showcase for Computation and Performance Trade-offsCode0
MoAI: Mixture of All Intelligence for Large Language and Vision ModelsCode3
Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts0
Unity by Diversity: Improved Representation Learning in Multimodal VAEsCode1
MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts0
ConstitutionalExperts: Training a Mixture of Principle-based Prompts0
Video Relationship Detection Using Mixture of ExpertsCode0
Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models0
TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of ExpertsCode2
Vanilla Transformers are Transfer Capability Teachers0
How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers0
Rethinking LLM Language Adaptation: A Case Study on Chinese MixtralCode5
Hypertext Entity Extraction in Webpage0
DMoERM: Recipes of Mixture-of-Experts for Effective Reward ModelingCode1
Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense0
Sequence-level Semantic Representation Fusion for Recommender SystemsCode1
XMoE: Sparse Models with Fine-grained and Adaptive Expert SelectionCode1
An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement0
m2mKD: Module-to-Module Knowledge Distillation for Modular TransformersCode0
ASEM: Enhancing Empathy in Chatbot through Attention-based Sentiment and Emotion ModelingCode0
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of ExpertsCode0
PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning0
LLMBind: A Unified Modality-Task Integration FrameworkCode1
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language ModelsCode2
Show:102550
← PrevPage 16 of 27Next →

No leaderboard results yet.