| Identifying Shopping Intent in Product QA for Proactive Recommendations | Apr 9, 2024 | FrictionMixture-of-Experts | —Unverified | 0 |
| Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models | Apr 8, 2024 | GPUMixture-of-Experts | —Unverified | 0 |
| SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts | Apr 7, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts | Apr 7, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Half-Space Feature Learning in Neural Networks | Apr 5, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Two Heads are Better than One: Nested PoE for Robust Defense Against Multi-Backdoors | Apr 2, 2024 | Data PoisoningHate Speech Detection | CodeCode Available | 0 |
| LITE: Modeling Environmental Ecosystems with Multimodal Large Language Models | Apr 1, 2024 | Decision MakingLanguage Modeling | CodeCode Available | 1 |
| Prompt-prompted Adaptive Structured Pruning for Efficient LLM Generation | Apr 1, 2024 | Mixture-of-Experts | CodeCode Available | 1 |
| Psychometry: An Omnifit Model for Image Reconstruction from Human Brain Activity | Mar 29, 2024 | Brain Computer InterfaceImage Reconstruction | —Unverified | 0 |
| Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks | Mar 29, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Jamba: A Hybrid Transformer-Mamba Language Model | Mar 28, 2024 | GPULanguage Modeling | CodeCode Available | 0 |
| Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study | Mar 26, 2024 | Learning TheoryMixture-of-Experts | —Unverified | 0 |
| Multi-Task Dense Prediction via Mixture of Low-Rank Experts | Mar 26, 2024 | DecoderMixture-of-Experts | CodeCode Available | 2 |
| GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot | Mar 20, 2024 | Mixture-of-ExpertsMulti-Task Learning | —Unverified | 0 |
| DESIRE-ME: Domain-Enhanced Supervised Information REtrieval using Mixture-of-Experts | Mar 20, 2024 | Information RetrievalMixture-of-Experts | CodeCode Available | 0 |
| Task-Customized Mixture of Adapters for General Image Fusion | Mar 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Mar 18, 2024 | Continual LearningIncremental Learning | CodeCode Available | 3 |
| Skeleton-Based Human Action Recognition with Noisy Labels | Mar 15, 2024 | Action RecognitionDenoising | CodeCode Available | 0 |
| Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts | Mar 14, 2024 | DenoisingMixture-of-Experts | CodeCode Available | 2 |
| MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | Mar 14, 2024 | In-Context LearningMixture-of-Experts | —Unverified | 0 |
| Unleashing the Power of Meta-tuning for Few-shot Generalization Through Sparse Interpolated Experts | Mar 13, 2024 | Domain GeneralizationFew-Shot Image Classification | CodeCode Available | 1 |
| Scattered Mixture-of-Experts Implementation | Mar 13, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Conditional computation in neural networks: principles and research trends | Mar 12, 2024 | Mixture-of-Expertsscientific discovery | —Unverified | 0 |
| Harder Tasks Need More Experts: Dynamic Routing in MoE Models | Mar 12, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 2 |
| Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM | Mar 12, 2024 | Arithmetic ReasoningCode Generation | —Unverified | 0 |
| Equipping Computational Pathology Systems with Artifact Processing Pipelines: A Showcase for Computation and Performance Trade-offs | Mar 12, 2024 | Airbubbles DetectionAnomaly Detection | CodeCode Available | 0 |
| MoAI: Mixture of All Intelligence for Large Language and Vision Models | Mar 12, 2024 | AllMixture-of-Experts | CodeCode Available | 3 |
| Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts | Mar 11, 2024 | Mixture-of-ExpertsReinforcement Learning (RL) | —Unverified | 0 |
| Unity by Diversity: Improved Representation Learning in Multimodal VAEs | Mar 8, 2024 | DecoderDiversity | CodeCode Available | 1 |
| MMoE: Robust Spoiler Detection with Multi-modal Information and Domain-aware Mixture-of-Experts | Mar 8, 2024 | Domain GeneralizationMixture-of-Experts | —Unverified | 0 |
| ConstitutionalExperts: Training a Mixture of Principle-based Prompts | Mar 7, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Video Relationship Detection Using Mixture of Experts | Mar 6, 2024 | Action RecognitionMixture-of-Experts | CodeCode Available | 0 |
| Mixture-of-LoRAs: An Efficient Multitask Tuning for Large Language Models | Mar 6, 2024 | Mixture-of-ExpertsMulti-Task Learning | —Unverified | 0 |
| TESTAM: A Time-Enhanced Spatio-Temporal Attention Model with Mixture of Experts | Mar 5, 2024 | Graph AttentionGraph Embedding | CodeCode Available | 2 |
| Vanilla Transformers are Transfer Capability Teachers | Mar 4, 2024 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| How does Architecture Influence the Base Capabilities of Pre-trained Language Models? A Case Study Based on FFN-Wider and MoE Transformers | Mar 4, 2024 | Few-Shot LearningLanguage Modeling | —Unverified | 0 |
| Rethinking LLM Language Adaptation: A Case Study on Chinese Mixtral | Mar 4, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 5 |
| Hypertext Entity Extraction in Webpage | Mar 4, 2024 | Mixture-of-Experts | —Unverified | 0 |
| DMoERM: Recipes of Mixture-of-Experts for Effective Reward Modeling | Mar 2, 2024 | Language ModellingLarge Language Model | CodeCode Available | 1 |
| Enhancing the "Immunity" of Mixture-of-Experts Networks for Adversarial Defense | Feb 29, 2024 | Adversarial DefenseAdversarial Robustness | —Unverified | 0 |
| Sequence-level Semantic Representation Fusion for Recommender Systems | Feb 28, 2024 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 |
| XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| An Effective Mixture-Of-Experts Approach For Code-Switching Speech Recognition Leveraging Encoder Disentanglement | Feb 27, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| m2mKD: Module-to-Module Knowledge Distillation for Modular Transformers | Feb 26, 2024 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 0 |
| ASEM: Enhancing Empathy in Chatbot through Attention-based Sentiment and Emotion Modeling | Feb 25, 2024 | ChatbotDiversity | CodeCode Available | 0 |
| Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts | Feb 23, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| PEMT: Multi-Task Correlation Guided Mixture-of-Experts Enables Parameter-Efficient Transfer Learning | Feb 23, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | —Unverified | 0 |
| LLMBind: A Unified Modality-Task Integration Framework | Feb 22, 2024 | AI AgentAudio Generation | CodeCode Available | 1 |
| Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models | Feb 22, 2024 | AllMixture-of-Experts | CodeCode Available | 2 |