| EfficientLLM: Efficiency in Large Language Models | May 20, 2025 | Mixture-of-ExpertsQuantization | —Unverified | 0 |
| Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive Prompting | May 20, 2025 | Continual Relation ExtractionMixture-of-Experts | CodeCode Available | 0 |
| THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation | May 20, 2025 | Machine TranslationMixture-of-Experts | —Unverified | 0 |
| Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach | May 20, 2025 | Audio-Visual Speech RecognitionMixture-of-Experts | —Unverified | 0 |
| U-SAM: An audio language Model for Unified Speech, Audio, and Music Understanding | May 20, 2025 | cross-modal alignmentLanguage Modeling | CodeCode Available | 1 |
| Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and Inference | May 19, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 |
| Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures | May 19, 2025 | Computational EfficiencyEnsemble Learning | —Unverified | 0 |
| True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics | May 19, 2025 | Mixture-of-ExpertsTime Series | —Unverified | 0 |
| Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models | May 19, 2025 | FairnessMixture-of-Experts | —Unverified | 0 |
| CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition | May 19, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Model Merging in Pre-training of Large Language Models | May 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Improving Coverage in Combined Prediction Sets with Weighted p-values | May 17, 2025 | Conformal PredictionMixture-of-Experts | —Unverified | 0 |
| MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging | May 17, 2025 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression Recognition | May 17, 2025 | Deep AttentionMamba | CodeCode Available | 0 |
| MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production | May 16, 2025 | Mixture-of-Experts | —Unverified | 0 |
| MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems | May 16, 2025 | BenchmarkingMixture-of-Experts | —Unverified | 0 |
| A Fast Kernel-based Conditional Independence test with Application to Causal Discovery | May 16, 2025 | Causal DiscoveryCausal Inference | —Unverified | 0 |
| On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating | May 16, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt Tuning | May 14, 2025 | MathMathematical Problem-Solving | CodeCode Available | 0 |
| Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures | May 14, 2025 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale | May 13, 2025 | Mixture-of-Experts | —Unverified | 0 |
| PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts | May 13, 2025 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| UMoE: Unifying Attention and FFN with Shared Experts | May 12, 2025 | Mixture-of-Experts | —Unverified | 0 |
| FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers | May 11, 2025 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts | May 11, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Seed1.5-VL Technical Report | May 11, 2025 | Mixture-of-ExpertsMultimodal Reasoning | —Unverified | 0 |
| QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration | May 10, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language Understanding | May 10, 2025 | DescriptiveEmotion Recognition | CodeCode Available | 1 |
| Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free | May 10, 2025 | AttributeMixture-of-Experts | CodeCode Available | 4 |
| FloE: On-the-Fly MoE Inference on Memory-constrained GPU | May 9, 2025 | CPUGPU | —Unverified | 0 |
| MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-Design | May 9, 2025 | Mixture-of-ExpertsQuantization | CodeCode Available | 1 |
| Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts | May 8, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs | May 7, 2025 | Mixture-of-Experts | —Unverified | 0 |
| SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios | May 7, 2025 | DiversityMixture-of-Experts | —Unverified | 0 |
| LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress? | May 7, 2025 | Large Language ModelMixture-of-Experts | CodeCode Available | 0 |
| STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation | May 6, 2025 | DiversityMixture-of-Experts | —Unverified | 0 |
| Faster MoE LLM Inference for Extremely Large Models | May 6, 2025 | Inference OptimizationMixture-of-Experts | —Unverified | 0 |
| 3D Gaussian Splatting Data Compression with Mixture of Priors | May 6, 2025 | 3DGSData Compression | —Unverified | 0 |
| Towards Smart Point-and-Shoot Photography | May 6, 2025 | Mixture-of-ExpertsWord Embeddings | —Unverified | 0 |
| Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems | May 5, 2025 | Beam PredictionDeep Learning | —Unverified | 0 |
| Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques | May 5, 2025 | Knowledge DistillationMixture-of-Experts | —Unverified | 0 |
| Finger Pose Estimation for Under-screen Fingerprint Sensor | May 5, 2025 | Mixture-of-ExpertsPose Estimation | CodeCode Available | 0 |
| Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance Fields | May 4, 2025 | Mixture-of-ExpertsNeRF | CodeCode Available | 3 |
| Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks | May 2, 2025 | Mixture-of-Experts | —Unverified | 0 |
| CoCoAFusE: Beyond Mixtures of Experts via Model Fusion | May 2, 2025 | Mixture-of-ExpertsPhilosophy | —Unverified | 0 |
| CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series | May 1, 2025 | Anomaly DetectionMeta-Learning | —Unverified | 0 |
| Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing | May 1, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling | May 1, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report Generation | Apr 29, 2025 | cross-modal alignmentDecoder | CodeCode Available | 0 |
| Accelerating Mixture-of-Experts Training with Adaptive Expert Replication | Apr 28, 2025 | GPUMixture-of-Experts | —Unverified | 0 |