| Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts | Nov 19, 2023 | DiversityMixture-of-Experts | CodeCode Available | 1 |
| DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets | Nov 8, 2023 | Mixture-of-Expertsobject-detection | CodeCode Available | 1 |
| SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts Models | Oct 29, 2023 | GPUMixture-of-Experts | CodeCode Available | 1 |
| SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code Translation | Oct 24, 2023 | Code GenerationCode Translation | CodeCode Available | 1 |
| Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach | Oct 18, 2023 | Blind Super-ResolutionDecoder | CodeCode Available | 1 |
| Merging Experts into One: Improving Computational Efficiency of Mixture of Experts | Oct 15, 2023 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 |
| Sparse Universal Transformer | Oct 11, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy | Oct 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection | Sep 26, 2023 | Instance SegmentationMixture-of-Experts | CodeCode Available | 1 |
| LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models | Sep 25, 2023 | GPUMixture-of-Experts | CodeCode Available | 1 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 |
| Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference | Aug 23, 2023 | CPUGPU | CodeCode Available | 1 |
| Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts | Aug 22, 2023 | Mixture-of-ExpertsNeRF | CodeCode Available | 1 |
| HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion | Aug 12, 2023 | AttributeKnowledge Graph Completion | CodeCode Available | 1 |
| MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models | Jul 18, 2023 | Language ModellingMixture-of-Experts | CodeCode Available | 1 |
| Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluation | Jun 15, 2023 | Image Quality AssessmentImage Super-Resolution | CodeCode Available | 1 |
| ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer | Jun 10, 2023 | Efficient ViTsMixture-of-Experts | CodeCode Available | 1 |
| Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks | Jun 7, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search | Jun 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts | May 30, 2023 | CPUGPU | CodeCode Available | 1 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Lifting the Curse of Capacity Gap in Distilling Language Models | May 20, 2023 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |
| Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration | May 1, 2023 | Data IntegrationEntity Resolution | CodeCode Available | 1 |
| Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation | Apr 3, 2023 | Mixture-of-ExpertsTransfer Learning | CodeCode Available | 1 |
| Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild | Apr 2, 2023 | Image Quality AssessmentMixture-of-Experts | CodeCode Available | 1 |
| MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering | Mar 2, 2023 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 |
| Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers | Mar 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Mixture of Decision Trees for Interpretable Machine Learning | Nov 26, 2022 | Interpretable Machine LearningMixture-of-Experts | CodeCode Available | 1 |
| Spatial Mixture-of-Experts | Nov 24, 2022 | Mixture-of-Experts | CodeCode Available | 1 |
| PAD-Net: An Efficient Framework for Dynamic Networks | Nov 10, 2022 | image-classificationImage Classification | CodeCode Available | 1 |
| M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design | Oct 26, 2022 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation | Oct 14, 2022 | CPUMachine Translation | CodeCode Available | 1 |
| Mixture of Attention Heads: Selecting Attention Heads Per Token | Oct 11, 2022 | Computational EfficiencyLanguage Modeling | CodeCode Available | 1 |
| Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-Experts | Oct 8, 2022 | Domain GeneralizationKnowledge Distillation | CodeCode Available | 1 |
| Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical Queries | Aug 16, 2022 | Mixture-of-Experts | CodeCode Available | 1 |
| Towards Understanding Mixture of Experts in Deep Learning | Aug 4, 2022 | Deep LearningMixture-of-Experts | CodeCode Available | 1 |
| Learning Soccer Juggling Skills with Layer-wise Mixture-of-Experts | Jul 24, 2022 | Deep Reinforcement LearningHumanoid Control | CodeCode Available | 1 |
| Sparse Mixture-of-Experts are Domain Generalizable Learners | Jun 8, 2022 | Domain GeneralizationMixture-of-Experts | CodeCode Available | 1 |
| Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image Segmentation | Jun 3, 2022 | DecoderImage Segmentation | CodeCode Available | 1 |
| Addressing Confounding Feature Issue for Causal Recommendation | May 13, 2022 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 |
| StableMoE: Stable Routing Strategy for Mixture of Experts | Apr 18, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation | Apr 15, 2022 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |
| 3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognition | Apr 7, 2022 | Mixture-of-Expertsspeech-recognition | CodeCode Available | 1 |
| Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution | Mar 27, 2022 | Image Super-ResolutionMixture-of-Experts | CodeCode Available | 1 |
| SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive Summarization | Mar 13, 2022 | Abstractive Text SummarizationDocument Summarization | CodeCode Available | 1 |
| Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models | Mar 2, 2022 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate | Dec 29, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identification | Dec 16, 2021 | Generalizable Person Re-identificationMixture-of-Experts | CodeCode Available | 1 |
| Unsupervised Foreground Extraction via Deep Region Competition | Oct 29, 2021 | Image SegmentationInductive Bias | CodeCode Available | 1 |
| HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder Models | Oct 8, 2021 | Abstractive Text SummarizationDecoder | CodeCode Available | 1 |