SOTAVerified

Mixture-of-Experts

Papers

Showing 101150 of 1312 papers

TitleStatusHype
EfficientLLM: Efficiency in Large Language Models0
Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive PromptingCode0
THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation0
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach0
U-SAM: An audio language Model for Unified Speech, Audio, and Music UnderstandingCode1
Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training and InferenceCode1
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures0
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics0
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models0
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via CompetitionCode0
Model Merging in Pre-training of Large Language Models0
Improving Coverage in Combined Prediction Sets with Weighted p-values0
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging0
Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression RecognitionCode0
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
A Fast Kernel-based Conditional Independence test with Application to Causal Discovery0
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt TuningCode0
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures0
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale0
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts0
UMoE: Unifying Attention and FFN with Shared Experts0
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers0
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts0
Seed1.5-VL Technical Report0
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration0
Emotion-Qwen: Training Hybrid Experts for Unified Emotion and General Vision-Language UnderstandingCode1
Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-FreeCode4
FloE: On-the-Fly MoE Inference on Memory-constrained GPU0
MxMoE: Mixed-precision Quantization for MoE with Accuracy and Performance Co-DesignCode1
Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts0
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs0
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios0
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?Code0
STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation0
Faster MoE LLM Inference for Extremely Large Models0
3D Gaussian Splatting Data Compression with Mixture of Priors0
Towards Smart Point-and-Shoot Photography0
Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems0
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques0
Finger Pose Estimation for Under-screen Fingerprint SensorCode0
Learning Heterogeneous Mixture of Scene Experts for Large-scale Neural Radiance FieldsCode3
Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks0
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion0
CICADA: Cross-Domain Interpretable Coding for Anomaly Detection and Adaptation in Multivariate Time Series0
Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice RoutingCode1
MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling0
MicarVLMoE: A Modern Gated Cross-Aligned Vision-Language Mixture of Experts Model for Medical Image Captioning and Report GenerationCode0
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication0
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.