SOTAVerified

Mixture-of-Experts

Papers

Showing 101125 of 1312 papers

TitleStatusHype
EfficientLLM: Efficiency in Large Language Models0
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning0
Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive PromptingCode0
U-SAM: An audio language Model for Unified Speech, Audio, and Music UnderstandingCode1
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach0
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures0
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and InferenceCode1
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models0
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via CompetitionCode0
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics0
Model Merging in Pre-training of Large Language Models0
Improving Coverage in Combined Prediction Sets with Weighted p-values0
Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression RecognitionCode0
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging0
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
A Fast Kernel-based Conditional Independence test with Application to Causal Discovery0
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating0
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt TuningCode0
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts0
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale0
UMoE: Unifying Attention and FFN with Shared Experts0
Seed1.5-VL Technical Report0
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers0
Show:102550
← PrevPage 5 of 53Next →

No leaderboard results yet.