SOTAVerified

Mixture-of-Experts

Papers

Showing 401450 of 1312 papers

TitleStatusHype
Configurable Foundation Models: Building LLMs from a Modular Perspective0
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow0
Conditional computation in neural networks: principles and research trends0
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models0
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating0
Imitation Learning from Observations: An Autoregressive Mixture of Experts Approach0
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation0
FinTeamExperts: Role Specialized MOEs For Financial Analysis0
On the Adaptation to Concept Drift for CTR Prediction0
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement0
FloE: On-the-Fly MoE Inference on Memory-constrained GPU0
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving0
FMT:A Multimodal Pneumonia Detection Model Based on Stacking MOE Framework0
ForceVLA: Enhancing VLA Models with a Force-aware MoE for Contact-rich Manipulation0
A Review of Sparse Expert Models in Deep Learning0
iMedImage Technical Report0
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs0
Complexity Experts are Task-Discriminative Learners for Any Image Restoration0
Fresh-CL: Feature Realignment through Experts on Hypersphere in Continual Learning0
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape0
ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL0
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models0
Continual Learning Using Task Conditional Neural Networks0
Full-Precision Free Binary Graph Neural Networks0
Functional-level Uncertainty Quantification for Calibrated Fine-tuning on LLMs0
Functional mixture-of-experts for classification0
FuseMoE: Mixture-of-Experts Transformers for Fleximodal Fusion0
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation0
Finding Fantastic Experts in MoEs: A Unified Study for Expert Dropping Strategies and Observations0
Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models0
A Review of DeepSeek Models' Key Innovative Techniques0
AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts0
Gating Dropout: Communication-efficient Regularization for Sparsely Activated Transformers0
Imitation Learning from MPC for Quadrupedal Multi-Gait Control0
Coordination with Humans via Strategy Matching0
GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input0
Generalizable Person Re-identification with Relevance-aware Mixture of Experts0
Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study0
Improved Training of Mixture-of-Experts Language GANs0
Affect in Tweets Using Experts Model0
Generator Assisted Mixture of Experts For Feature Acquisition in Batch0
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot0
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings0
GigaChat Family: Efficient Russian Language Modeling Through Mixture of Experts Architecture0
GLA in MediaEval 2018 Emotional Impact of Movies Task0
GLaM: Efficient Scaling of Language Models with Mixture-of-Experts0
FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts0
IDEA: An Inverse Domain Expert Adaptation Based Active DNN IP Protection Method0
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation0
Hypertext Entity Extraction in Webpage0
Show:102550
← PrevPage 9 of 27Next →

No leaderboard results yet.