| Predictable Scale: Part I -- Optimal Hyperparameter Scaling Law in Large Language Model Pretraining | Mar 6, 2025 | GPUHyperparameter Optimization | —Unverified | 0 |
| A Generalist Cross-Domain Molecular Learning Framework for Structure-Based Drug Discovery | Mar 6, 2025 | DenoisingDrug Discovery | —Unverified | 0 |
| Question-Aware Gaussian Experts for Audio-Visual Question Answering | Mar 6, 2025 | Audio-visual Question AnsweringAudio-Visual Question Answering (AVQA) | CodeCode Available | 1 |
| Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling | Mar 6, 2025 | Mixture-of-ExpertsScheduling | —Unverified | 0 |
| BrainNet-MoE: Brain-Inspired Mixture-of-Experts Learning for Neurological Disease Identification | Mar 5, 2025 | Mixture-of-Experts | —Unverified | 0 |
| VoiceGRPO: Modern MoE Transformers with Group Relative Policy Optimization GRPO for AI Voice Health Care Applications on Voice Pathology Detection | Mar 5, 2025 | DiagnosticMixture-of-Experts | CodeCode Available | 0 |
| Convergence Rates for Softmax Gating Mixture of Experts | Mar 5, 2025 | Mixture-of-Expertsparameter estimation | —Unverified | 0 |
| Small but Mighty: Enhancing Time Series Forecasting with Lightweight LLMs | Mar 5, 2025 | Computational EfficiencyDescriptive | CodeCode Available | 1 |
| Tabby: Tabular Data Synthesis with Language Models | Mar 4, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MX-Font++: Mixture of Heterogeneous Aggregation Experts for Few-shot Font Generation | Mar 4, 2025 | Font GenerationMixture-of-Experts | CodeCode Available | 1 |
| Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer | Mar 4, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 0 |
| How Do Consumers Really Choose: Exposing Hidden Preferences with the Mixture of Experts Model | Mar 3, 2025 | Decision MakingDemand Forecasting | —Unverified | 0 |
| Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting | Mar 3, 2025 | Domain GeneralizationMixture-of-Experts | —Unverified | 0 |
| ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition | Mar 3, 2025 | Autonomous DrivingComputational Efficiency | —Unverified | 0 |
| DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models | Mar 3, 2025 | Mixture-of-ExpertsQuantization | —Unverified | 0 |
| PROPER: A Progressive Learning Framework for Personalized Large Language Models with Group-Level Adaptation | Mar 3, 2025 | Mixture-of-Expertsparameter-efficient fine-tuning | —Unverified | 0 |
| Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion | Mar 2, 2025 | Mixture-of-Expertswhole slide images | —Unverified | 0 |
| CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering | Mar 1, 2025 | Continual LearningLanguage Modeling | —Unverified | 0 |
| CoSMoEs: Compact Sparse Mixture of Experts | Feb 28, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems | Feb 27, 2025 | Action DetectionActivity Detection | —Unverified | 0 |
| R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts | Feb 27, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook | Feb 27, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Comet: Fine-grained Computation-communication Overlapping for Mixture-of-Experts | Feb 27, 2025 | Computational EfficiencyGPU | CodeCode Available | 5 |
| Mixture of Experts for Recognizing Depression from Interview and Reading Tasks | Feb 27, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization | Feb 26, 2025 | Mixture-of-Experts | —Unverified | 0 |
| OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment | Feb 26, 2025 | Mixture-of-ExpertsRecommendation Systems | —Unverified | 0 |
| Delta Decompression for MoE-based LLMs Compression | Feb 24, 2025 | DiversityMixture-of-Experts | CodeCode Available | 2 |
| The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE | Feb 24, 2025 | Linear Mode ConnectivityMixture-of-Experts | —Unverified | 0 |
| ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds | Feb 24, 2025 | DiagnosticMixture-of-Experts | —Unverified | 0 |
| BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference | Feb 24, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks | Feb 24, 2025 | Mixture-of-ExpertsMMLU | —Unverified | 0 |
| Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and Mixture-of-Experts Optimization Alignment | Feb 24, 2025 | image-classificationImage Classification | CodeCode Available | 2 |
| An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning | Feb 22, 2025 | ARCContinual Learning | —Unverified | 0 |
| Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts Models | Feb 21, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Tight Clusters Make Specialized Experts | Feb 21, 2025 | ClusteringLanguage Modeling | CodeCode Available | 0 |
| Ray-Tracing for Conditionally Activated Neural Networks | Feb 20, 2025 | Mixture-of-Experts | —Unverified | 0 |
| ChatVLA: Unified Multimodal Understanding and Robot Control with Vision-Language-Action Model | Feb 20, 2025 | Mixture-of-ExpertsQuestion Answering | CodeCode Available | 1 |
| Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts | Feb 19, 2025 | Dictionary LearningMixture-of-Experts | —Unverified | 0 |
| DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs | Feb 18, 2025 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models | Feb 18, 2025 | Knowledge DistillationMixture-of-Experts | —Unverified | 0 |
| MoBA: Mixture of Block Attention for Long-Context LLMs | Feb 18, 2025 | Mixture-of-Experts | CodeCode Available | 7 |
| Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer Gate | Feb 17, 2025 | GPUMixture-of-Experts | CodeCode Available | 0 |
| Connector-S: A Survey of Connectors in Multi-modal Large Language Models | Feb 17, 2025 | Mixture-of-ExpertsSurvey | —Unverified | 0 |
| How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines | Feb 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models | Feb 16, 2025 | energy managementMixture-of-Experts | —Unverified | 0 |
| Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time | Feb 16, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Probing Semantic Routing in Large Mixture-of-Expert Models | Feb 15, 2025 | Mixture-of-ExpertsSentence | —Unverified | 0 |
| Eidetic Learning: an Efficient and Provable Solution to Catastrophic Forgetting | Feb 13, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Heterogeneous Mixture of Experts for Remote Sensing Image Super-Resolution | Feb 12, 2025 | Image Super-ResolutionMixture-of-Experts | CodeCode Available | 1 |
| Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification | Feb 12, 2025 | Mixture-of-ExpertsNode Classification | —Unverified | 0 |