| GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving | Jul 19, 2025 | Autonomous DrivingBench2Drive | CodeCode Available | 0 |
| R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept Learning | Jul 17, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| Mixture of Experts in Large Language Models | Jul 15, 2025 | DiversityLanguage Modeling | —Unverified | 0 |
| Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive | Jul 13, 2025 | CPUInteractive Segmentation | —Unverified | 0 |
| KAT-V1: Kwai-AutoThink Technical Report | Jul 11, 2025 | Knowledge DistillationLarge Language Model | —Unverified | 0 |
| MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting Models | Jul 9, 2025 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis | Jul 8, 2025 | Data AugmentationMixture-of-Experts | —Unverified | 0 |
| A Survey on Prompt Tuning | Jul 8, 2025 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 0 |
| Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate | Jul 8, 2025 | Continual LearningMixture-of-Experts | CodeCode Available | 0 |
| What You Have is What You Track: Adaptive and Robust Multimodal Tracking | Jul 8, 2025 | Mixture-of-ExpertsVisual Tracking | CodeCode Available | 0 |
| Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach | Jul 8, 2025 | Edge-computingFederated Learning | —Unverified | 0 |
| UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification | Jul 7, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Learning Robust Stereo Matching in the Wild with Selective Mixture-of-Experts | Jul 7, 2025 | Inductive BiasMixture-of-Experts | CodeCode Available | 2 |
| Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging | Jun 29, 2025 | Inference OptimizationMixture-of-Experts | CodeCode Available | 0 |
| Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning | Jun 26, 2025 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-Experts | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning | Jun 26, 2025 | Compositional Zero-Shot LearningMixture-of-Experts | —Unverified | 0 |
| Learning to Skip the Middle Layers of Transformers | Jun 26, 2025 | Mixture-of-Experts | CodeCode Available | 1 |
| Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration | Jun 25, 2025 | Clinical KnowledgeComputed Tomography (CT) | —Unverified | 0 |
| An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify | Jun 23, 2025 | Click-Through Rate PredictionMixture-of-Experts | —Unverified | 0 |
| Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks | Jun 23, 2025 | Mixture-of-ExpertsSafety Alignment | —Unverified | 0 |
| SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification | Jun 20, 2025 | Mixture-of-ExpertsResponse Generation | —Unverified | 0 |
| LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing | Jun 17, 2025 | ARCCoLA | —Unverified | 0 |
| NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification | Jun 17, 2025 | DiagnosticMixture-of-Experts | —Unverified | 0 |
| Utility-Driven Speculative Decoding for Mixture-of-Experts | Jun 17, 2025 | GPULarge Language Model | —Unverified | 0 |
| GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection Vectors | Jun 17, 2025 | Bilevel OptimizationMixture-of-Experts | CodeCode Available | 0 |
| Single-Example Learning in a Mixture of GPDMs with Latent Geometries | Jun 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs | Jun 17, 2025 | Data IntegrationLarge Language Model | —Unverified | 0 |
| Scaling Intelligence: Designing Data Centers for Next-Gen Language Models | Jun 17, 2025 | Mixture-of-Experts | —Unverified | 0 |
| MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models | Jun 17, 2025 | Mixture-of-ExpertsQuantization | —Unverified | 0 |
| Exploring Speaker Diarization with Mixture of Experts | Jun 17, 2025 | Mixture-of-Expertsspeaker-diarization | —Unverified | 0 |
| MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention | Jun 16, 2025 | Mixture-of-ExpertsReinforcement Learning (RL) | CodeCode Available | 7 |
| Load Balancing Mixture of Experts with Similarity Preserving Routers | Jun 16, 2025 | Mixture-of-Experts | —Unverified | 0 |
| EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware Optimization | Jun 16, 2025 | Mixture-of-ExpertsModel Compression | CodeCode Available | 0 |
| Serving Large Language Models on Huawei CloudMatrix384 | Jun 15, 2025 | Mixture-of-ExpertsQuantization | —Unverified | 0 |
| Structural Similarity-Inspired Unfolding for Lightweight Image Super-Resolution | Jun 13, 2025 | Image Super-ResolutionMixture-of-Experts | CodeCode Available | 1 |
| Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts | Jun 12, 2025 | DiversityMinecraft | —Unverified | 0 |
| GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture | Jun 11, 2025 | Language ModelingLanguage Modelling | —Unverified | 0 |
| MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding | Jun 10, 2025 | DiagnosticMixture-of-Experts | —Unverified | 0 |
| A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow Modeling | Jun 9, 2025 | Mixture-of-Experts | CodeCode Available | 0 |
| M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration | Jun 9, 2025 | AllImage Restoration | —Unverified | 0 |
| MIRA: Medical Time Series Foundation Model for Real-World Health Data | Jun 9, 2025 | EthicsMissing Values | —Unverified | 0 |
| STAMImputer: Spatio-Temporal Attention MoE for Traffic Data Imputation | Jun 9, 2025 | Graph AttentionImputation | CodeCode Available | 0 |
| MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert Specialization | Jun 9, 2025 | Click-Through Rate PredictionDiversity | CodeCode Available | 0 |
| MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing | Jun 9, 2025 | GPUMixture-of-Experts | —Unverified | 0 |
| Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning | Jun 7, 2025 | Continual LearningFederated Learning | —Unverified | 0 |
| SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities | Jun 6, 2025 | Mixture-of-Experts | —Unverified | 0 |
| Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection | Jun 5, 2025 | Fake News Detectionknowledge editing | —Unverified | 0 |
| FlashDMoE: Fast Distributed MoE in a Single Kernel | Jun 5, 2025 | 16kCPU | CodeCode Available | 3 |
| Brain-Like Processing Pathways Form in Models With Heterogeneous Experts | Jun 3, 2025 | FormMixture-of-Experts | —Unverified | 0 |