| Improving Coverage in Combined Prediction Sets with Weighted p-values | May 17, 2025 | Conformal PredictionMixture-of-Experts | —Unverified | 0 |
| Model Merging in Pre-training of Large Language Models | May 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition | May 17, 2025 | Deep AttentionMamba | CodeCode Available | 0 |
| MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging | May 17, 2025 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | May 16, 2025 | Mixture-of-Experts | —Unverified | 0 |
| MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | May 16, 2025 | BenchmarkingMixture-of-Experts | —Unverified | 0 |
| A Fast Kernel-based Conditional Independence test with Application to Causal Discovery | May 16, 2025 | Causal DiscoveryCausal Inference | —Unverified | 0 |
| On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating | May 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures | May 14, 2025 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |