SOTAVerified

Mixture-of-Experts

Papers

Showing 851900 of 1312 papers

TitleStatusHype
Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks0
Ring-lite: Scalable Reasoning via C3PO-Stabilized Reinforcement Learning for LLMs0
RingMoE: Mixture-of-Modality-Experts Multi-Modal Foundation Models for Universal Remote Sensing Image Interpretation0
Robust and Explainable Depression Identification from Speech Using Vowel-Based Ensemble Learning Approaches0
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts0
Robust Calibration For Improved Weather Prediction Under Distributional Shift0
Robust mixture of experts modeling using the skew t distribution0
Robust mixture of experts modeling using the t distribution0
RocketPPA: Code-Level Power, Performance, and Area Prediction via LLM and Mixture of Experts0
Routers in Vision Mixture of Experts: An Empirical Study0
RS-MoE: Mixture of Experts for Remote Sensing Image Captioning and Visual Question Answering0
RTM Ensemble Learning Results at Quality Estimation Task0
RTM Stacking Results for Machine Translation Performance Prediction0
RTM Super Learner Results at Quality Estimation Task0
S2MoE: Robust Sparse Mixture of Experts via Stochastic Learning0
Safe Real-World Autonomous Driving by Learning to Predict and Plan with a Mixture of Experts0
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification0
SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation0
Scalable and Efficient MoE Training for Multitask Multilingual Models0
Scalable Multi-Domain Adaptation of Language Models using Modular Experts0
Scalable Neural Data Server: A Data Recommender for Transfer Learning0
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach0
Scaling Intelligence: Designing Data Centers for Next-Gen Language Models0
Scaling Laws Across Model Architectures: A Comparative Analysis of Dense and MoE Models in Large Language Models0
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models0
Scaling Vision-Language Models with Sparse Mixture of Experts0
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection0
SciDFM: A Large Language Model with Mixture-of-Experts for Science0
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR0
Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks0
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning0
Seed1.5-VL Technical Report0
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models0
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts0
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach0
Semantic-Aware Dynamic Parameter for Video Inpainting Transformer0
Probing Semantic Routing in Large Mixture-of-Expert Models0
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation0
MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services0
Serving Large Language Models on Huawei CloudMatrix3840
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget0
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts0
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts0
Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective0
Simple or Complex? Complexity-Controllable Question Generation with Soft Templates and Deep Mixture of Experts Model0
SimSMoE: Solving Representational Collapse via Similarity Measure0
Simultaneous Feature and Expert Selection within Mixture of Experts0
Single-Example Learning in a Mixture of GPDMs with Latent Geometries0
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills0
SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities0
Show:102550
← PrevPage 18 of 27Next →

No leaderboard results yet.