| Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts | Aug 22, 2023 | Mixture-of-ExpertsNeRF | CodeCode Available | 1 | 5 |
| Examining Post-Training Quantization for Mixture-of-Experts: A Benchmark | Jun 12, 2024 | BenchmarkingMixture-of-Experts | CodeCode Available | 1 | 5 |
| Multi-Task Reinforcement Learning with Mixture of Orthogonal Experts | Nov 19, 2023 | DiversityMixture-of-Experts | CodeCode Available | 1 | 5 |
| Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting | Aug 20, 2024 | AttributeMixture-of-Experts | CodeCode Available | 1 | 5 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine Translation | Oct 14, 2022 | CPUMachine Translation | CodeCode Available | 1 | 5 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 | 5 |
| XMoE: Sparse Models with Fine-grained and Adaptive Expert Selection | Feb 27, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Multi-Head Mixture-of-Experts | Apr 23, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Multilinear Mixture of Experts: Scalable Expert Specialization through Factorization | Feb 19, 2024 | Attributecounterfactual | CodeCode Available | 1 | 5 |
| Go Wider Instead of Deeper | Jul 25, 2021 | Image ClassificationMixture-of-Experts | CodeCode Available | 1 | 5 |
| Efficient Dictionary Learning with Switch Sparse Autoencoders | Oct 10, 2024 | Dictionary LearningMixture-of-Experts | CodeCode Available | 1 | 5 |
| Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs | Jul 1, 2024 | GPUMixture-of-Experts | CodeCode Available | 1 | 5 |
| Efficient Fine-tuning of Audio Spectrogram Transformers via Soft Mixture of Adapters | Feb 1, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 1 | 5 |
| Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder Approach | Oct 18, 2023 | Blind Super-ResolutionDecoder | CodeCode Available | 1 | 5 |
| Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss | Sep 9, 2021 | Mixture-of-ExpertsRetrieval | CodeCode Available | 1 | 5 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 | 5 |
| JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model | May 22, 2025 | GPULong-range modeling | CodeCode Available | 1 | 5 |
| EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse Gate | Dec 29, 2021 | Language ModelingLanguage Modelling | CodeCode Available | 1 | 5 |
| Dense Backpropagation Improves Training for Sparse Mixture-of-Experts | Apr 16, 2025 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node | May 27, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 | 5 |
| Multimodal Clinical Trial Outcome Prediction with Large Language Models | Feb 9, 2024 | Mixture-of-ExpertsPrediction | CodeCode Available | 1 | 5 |
| Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks | Jun 7, 2023 | Mixture-of-Experts | CodeCode Available | 1 | 5 |
| Layerwise Recurrent Router for Mixture-of-Experts | Aug 13, 2024 | AttributeMixture-of-Experts | CodeCode Available | 1 | 5 |
| Sequence-level Semantic Representation Fusion for Recommender Systems | Feb 28, 2024 | Mixture-of-ExpertsRecommendation Systems | CodeCode Available | 1 | 5 |