| OMoE: Diversifying Mixture of Low-Rank Adaptation by Orthogonal Finetuning | Jan 17, 2025 | Computational EfficiencyDiversity | —Unverified | 0 |
| LLM-Based Routing in Mixture of Experts: A Novel Framework for Trading | Jan 16, 2025 | Mixture-of-ExpertsWorld Knowledge | —Unverified | 0 |
| MiniMax-01: Scaling Foundation Models with Lightning Attention | Jan 14, 2025 | Mixture-of-Experts | CodeCode Available | 7 |
| PSReg: Prior-guided Sparse Mixture of Experts for Point Cloud Registration | Jan 14, 2025 | Mixture-of-ExpertsPoint Cloud Registration | —Unverified | 0 |
| GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism | Jan 14, 2025 | Mixture-of-Experts | —Unverified | 0 |
| A Multi-Modal Deep Learning Framework for Pan-Cancer Prognosis | Jan 13, 2025 | Deep LearningMixture-of-Experts | CodeCode Available | 0 |
| Transforming Vision Transformer: Towards Efficient Multi-Task Asynchronous Learning | Jan 12, 2025 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| TAMER: A Test-Time Adaptive MoE-Driven Framework for EHR Representation Learning | Jan 10, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 0 |
| Optimizing Distributed Deployment of Mixture-of-Experts Model Inference in Serverless Computing | Jan 9, 2025 | Bayesian OptimizationCPU | —Unverified | 0 |
| mFabric: An Efficient and Scalable Fabric for Mixture-of-Experts Training | Jan 7, 2025 | BlockingGPU | —Unverified | 0 |
| LiMoE: Mixture of LiDAR Representation Learners from Automotive Scenes | Jan 7, 2025 | Mixture-of-ExpertsRepresentation Learning | CodeCode Available | 2 |
| Mixture-of-Experts Graph Transformers for Interpretable Particle Collision Detection | Jan 6, 2025 | Decision MakingMixture-of-Experts | CodeCode Available | 0 |
| Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning | Jan 4, 2025 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| MoVE-KD: Knowledge Distillation for VLMs with Mixture of Visual Encoders | Jan 3, 2025 | Knowledge DistillationMixture-of-Experts | —Unverified | 0 |
| Learning Heterogeneous Tissues with Mixture of Experts for Gigapixel Whole Slide Images | Jan 1, 2025 | Mixture-of-Expertswhole slide images | —Unverified | 0 |
| UNIALIGN: Scaling Multimodal Alignment within One Unified Model | Jan 1, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Correlative and Discriminative Label Grouping for Multi-Label Visual Prompt Tuning | Jan 1, 2025 | image-classificationImage Classification | —Unverified | 0 |
| Towards Efficient Foundation Model for Zero-shot Amodal Segmentation | Jan 1, 2025 | Mixture-of-Experts | —Unverified | 0 |
| MExD: An Expert-Infused Diffusion Model for Whole-Slide Image Classification | Jan 1, 2025 | image-classificationImage Classification | —Unverified | 0 |
| REM: A Scalable Reinforced Multi-Expert Framework for Multiplex Influence Maximization | Jan 1, 2025 | Mixture-of-Experts | —Unverified | 0 |
| CNC: Cross-modal Normality Constraint for Unsupervised Multi-class Anomaly Detection | Dec 31, 2024 | Anomaly DetectionAttribute | —Unverified | 0 |
| Superposition in Transformers: A Novel Way of Building Mixture of Experts | Dec 31, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Multimodal Variational Autoencoder: a Barycentric View | Dec 29, 2024 | Mixture-of-ExpertsRepresentation Learning | —Unverified | 0 |
| UniRestorer: Universal Image Restoration via Adaptively Estimating Image Degradation at Proper Granularity | Dec 28, 2024 | Image RestorationMixture-of-Experts | CodeCode Available | 0 |
| DeepSeek-V3 Technical Report | Dec 27, 2024 | GPULanguage Modeling | CodeCode Available | 16 |
| Graph Mixture of Experts and Memory-augmented Routers for Multivariate Time Series Anomaly Detection | Dec 26, 2024 | Anomaly DetectionMixture-of-Experts | —Unverified | 0 |
| AskChart: Universal Chart Understanding through Textual Enhancement | Dec 26, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 0 |
| BIG-MoE: Bypass Isolated Gating MoE for Generalized Multimodal Face Anti-Spoofing | Dec 24, 2024 | Decision MakingFace Anti-Spoofing | CodeCode Available | 0 |
| UME: Upcycling Mixture-of-Experts for Scalable and Efficient Automatic Speech Recognition | Dec 23, 2024 | Automatic Speech RecognitionAutomatic Speech Recognition (ASR) | —Unverified | 0 |
| BrainMAP: Learning Multiple Activation Pathways in Brain Networks | Dec 23, 2024 | MambaMixture-of-Experts | CodeCode Available | 1 |
| Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models | Dec 22, 2024 | Mixture-of-ExpertsPOS | —Unverified | 0 |
| Theory of Mixture-of-Experts for Mobile Edge Computing | Dec 20, 2024 | Computational EfficiencyContinual Learning | —Unverified | 0 |
| Qwen2.5 Technical Report | Dec 19, 2024 | Common Sense Reasoning | CodeCode Available | 13 |
| ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing | Dec 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| A Survey on Inference Optimization Techniques for Mixture of Experts Models | Dec 18, 2024 | Computational EfficiencyDistributed Computing | CodeCode Available | 3 |
| MedCoT: Medical Chain of Thought via Hierarchical Expert | Dec 18, 2024 | DiagnosticMedical Visual Question Answering | CodeCode Available | 1 |
| SEKE: Specialised Experts for Keyword Extraction | Dec 18, 2024 | DescriptiveKeyword Extraction | CodeCode Available | 0 |
| SMOSE: Sparse Mixture of Shallow Experts for Interpretable Reinforcement Learning in Continuous Control Tasks | Dec 17, 2024 | continuous-controlContinuous Control | CodeCode Available | 0 |
| DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference | Dec 16, 2024 | CPUGPU | CodeCode Available | 0 |
| Enhancing Healthcare Recommendation Systems with a Multimodal LLMs-based MOE Architecture | Dec 16, 2024 | Mixture-of-ExpertsRecommendation Systems | —Unverified | 0 |
| Investigating Mixture of Experts in Dense Retrieval | Dec 16, 2024 | Information RetrievalMixture-of-Experts | —Unverified | 0 |
| Towards Adversarial Robustness of Model-Level Mixture-of-Experts Architectures for Semantic Segmentation | Dec 16, 2024 | Adversarial RobustnessMixture-of-Experts | CodeCode Available | 0 |
| Wonderful Matrices: Combining for a More Efficient and Effective Foundation Model Architecture | Dec 16, 2024 | Mixture-of-ExpertsPosition | CodeCode Available | 1 |
| DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-Identification | Dec 14, 2024 | Mixture-of-ExpertsObject | CodeCode Available | 2 |
| Llama 3 Meets MoE: Efficient Upcycling | Dec 13, 2024 | Mixture-of-ExpertsMMLU | CodeCode Available | 0 |
| DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding | Dec 13, 2024 | Chart UnderstandingMixture-of-Experts | CodeCode Available | 9 |
| Towards a Multimodal Large Language Model with Pixel-Level Insight for Biomedicine | Dec 12, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 2 |
| Mixture of Experts Meets Decoupled Message Passing: Towards General and Adaptive Node Classification | Dec 11, 2024 | Computational Efficiency | CodeCode Available | 0 |
| Adaptive Prompting for Continual Relation Extraction: A Within-Task Variance Perspective | Dec 11, 2024 | Continual Relation ExtractionMixture-of-Experts | —Unverified | 0 |
| MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | Dec 10, 2024 | BenchmarkingMixture-of-Experts | —Unverified | 0 |