SOTAVerified

Mixture-of-Experts

Papers

Showing 150 of 1312 papers

TitleStatusHype
GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous DrivingCode0
R^2MoE: Redundancy-Removal Mixture of Experts for Lifelong Concept LearningCode0
Mixture of Experts in Large Language Models0
Inter2Former: Dynamic Hybrid Attention for Efficient High-Precision Interactive0
KAT-V1: Kwai-AutoThink Technical Report0
MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting ModelsCode2
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis0
A Survey on Prompt TuningCode0
Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen SubstrateCode0
What You Have is What You Track: Adaptive and Robust Multimodal TrackingCode0
Efficient Training of Large-Scale AI Models Through Federated Mixture-of-Experts: A System-Level Approach0
UGG-ReID: Uncertainty-Guided Graph Model for Multi-Modal Object Re-Identification0
Learning Robust Stereo Matching in the Wild with Selective Mixture-of-ExpertsCode2
Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert MergingCode0
Little By Little: Continual Learning via Self-Activated Sparse Mixture-of-Rank Adaptive Learning0
Latent Prototype Routing: Achieving Near-Perfect Load Balancing in Mixture-of-ExpertsCode0
EVA: Mixture-of-Experts Semantic Variant Alignment for Compositional Zero-Shot Learning0
Learning to Skip the Middle Layers of TransformersCode1
Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration0
An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify0
Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks0
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification0
LoRA-Mixer: Coordinate Modular LoRA Experts Through Serial Attention Routing0
NeuroMoE: A Transformer-Based Mixture-of-Experts Framework for Multi-Modal Neurological Disorder Classification0
Utility-Driven Speculative Decoding for Mixture-of-Experts0
GuiLoMo: Allocating Expert Number and Rank for LoRA-MoE via Bilevel Optimization with GuidedSelection VectorsCode0
Single-Example Learning in a Mixture of GPDMs with Latent Geometries0
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs0
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models0
MoTE: Mixture of Ternary Experts for Memory-efficient Large Multimodal Models0
Exploring Speaker Diarization with Mixture of Experts0
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning AttentionCode7
Load Balancing Mixture of Experts with Similarity Preserving Routers0
EAQuant: Enhancing Post-Training Quantization for MoE Models via Expert-Aware OptimizationCode0
Serving Large Language Models on Huawei CloudMatrix3840
Structural Similarity-Inspired Unfolding for Lightweight Image Super-ResolutionCode1
Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts0
GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture0
MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding0
A Two-Phase Deep Learning Framework for Adaptive Time-Stepping in High-Speed Flow ModelingCode0
M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration0
MIRA: Medical Time Series Foundation Model for Real-World Health Data0
STAMImputer: Spatio-Temporal Attention MoE for Traffic Data ImputationCode0
MoE-MLoRA for Multi-Domain CTR Prediction: Efficient Adaptation with Expert SpecializationCode0
MoE-GPS: Guidlines for Prediction Strategy for Dynamic Expert Duplication in MoE Load Balancing0
Breaking Data Silos: Towards Open and Scalable Mobility Foundation Models via Generative Continual Learning0
SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities0
Lifelong Evolution: Collaborative Learning between Large and Small Language Models for Continuous Emergent Fake News Detection0
FlashDMoE: Fast Distributed MoE in a Single KernelCode3
Brain-Like Processing Pathways Form in Models With Heterogeneous Experts0
Show:102550
← PrevPage 1 of 27Next →

No leaderboard results yet.