SOTAVerified

Mixture-of-Experts

Papers

Showing 651700 of 1312 papers

TitleStatusHype
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models0
Scaling Vision-Language Models with Sparse Mixture of Experts0
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection0
SciDFM: A Large Language Model with Mixture-of-Experts for Science0
SC-MoE: Switch Conformer Mixture of Experts for Unified Streaming and Non-streaming Code-Switching ASR0
Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks0
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning0
Seed1.5-VL Technical Report0
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models0
SEER-MoE: Sparse Expert Efficiency through Regularization for Mixture-of-Experts0
Self-tuned Visual Subclass Learning with Shared Samples An Incremental Approach0
Semantic-Aware Dynamic Parameter for Video Inpainting Transformer0
Probing Semantic Routing in Large Mixture-of-Expert Models0
SemEval-2025 Task 1: AdMIRe -- Advancing Multimodal Idiomaticity Representation0
MoESys: A Distributed and Efficient Mixture-of-Experts Training and Inference System for Internet Services0
Serving Large Language Models on Huawei CloudMatrix3840
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget0
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts0
Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts0
Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective0
Simple or Complex? Complexity-Controllable Question Generation with Soft Templates and Deep Mixture of Experts Model0
SimSMoE: Solving Representational Collapse via Similarity Measure0
Simultaneous Feature and Expert Selection within Mixture of Experts0
Single-Example Learning in a Mixture of GPDMs with Latent Geometries0
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills0
SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities0
SMILE: Scaling Mixture-of-Experts with Efficient Bi-level Routing0
Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning0
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners0
Sparsely Activated Mixture-of-Experts are Robust Multi-Task Learners0
Sparse Mixers: Combining MoE and Mixing to build a more efficient BERT0
Sparse Mixture of Experts as Unified Competitive Learning0
Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images0
Cross-token Modeling with Conditional Computation0
Sparse Upcycling: Inference Inefficient Finetuning0
Sparse Video Representation Using Steered Mixture-of-Experts With Global Motion Compensation0
Sparsity-Constrained Optimal Transport0
Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling0
SpeechMatrix: A Large-Scale Mined Corpus of Multilingual Speech-to-Speech Translations0
SpeechMoE2: Mixture-of-Experts Model with Improved Routing0
Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis0
SPMoE: Generate Multiple Pattern-Aware Outputs with Sparse Pattern Mixture of Experts0
SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging0
StableMoE: Stable Routing Strategy for Mixture of Experts0
STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation0
Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference0
Statistical Advantages of Perturbing Cosine Router in Mixture of Experts0
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts0
Stealing User Prompts from Mixture of Experts0
Steered Mixture-of-Experts Autoencoder Design for Real-Time Image Modelling and Denoising0
Show:102550
← PrevPage 14 of 27Next →

No leaderboard results yet.