| Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought | May 21, 2025 | ChatbotInstruction Following | —Unverified | 0 |
| Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert Models | May 21, 2025 | AllCPU | CodeCode Available | 0 |
| CoLA: Collaborative Low-Rank Adaptation | May 21, 2025 | CoLAMixture-of-Experts | CodeCode Available | 0 |
| MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual Decoding | May 21, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks | May 21, 2025 | Mixture-of-Experts | —Unverified | 0 |
| FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation | May 20, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting | May 20, 2025 | Continual Relation ExtractionMixture-of-Experts | CodeCode Available | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| Multimodal Cultural Safety: Evaluation Frameworks and Alignment Strategies | May 20, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation | May 20, 2025 | Machine TranslationMixture-of-Experts | —Unverified | 0 |
| StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning | May 20, 2025 | class-incremental learningClass Incremental Learning | —Unverified | 0 |
| Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition | May 20, 2025 | Emotion RecognitionMixture-of-Experts | —Unverified | 0 |
| Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training | May 20, 2025 | AllDomain Generalization | —Unverified | 0 |
| EfficientLLM: Efficiency in Large Language Models | May 20, 2025 | Mixture-of-ExpertsQuantization | —Unverified | 0 |
| Balanced and Elastic End-to-end Training of Dynamic LLMs | May 20, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics | May 19, 2025 | Mixture-of-ExpertsTime Series | —Unverified | 0 |
| CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition | May 19, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures | May 19, 2025 | Computational EfficiencyEnsemble Learning | —Unverified | 0 |
| Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models | May 19, 2025 | FairnessMixture-of-Experts | —Unverified | 0 |
| Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition | May 17, 2025 | Deep AttentionMamba | CodeCode Available | 0 |
| MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging | May 17, 2025 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Model Merging in Pre-training of Large Language Models | May 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Improving Coverage in Combined Prediction Sets with Weighted p-values | May 17, 2025 | Conformal PredictionMixture-of-Experts | —Unverified | 0 |
| A Fast Kernel-based Conditional Independence test with Application to Causal Discovery | May 16, 2025 | Causal DiscoveryCausal Inference | —Unverified | 0 |
| On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating | May 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |