SOTAVerified

Mixture-of-Experts

Papers

Showing 551600 of 1312 papers

TitleStatusHype
ECG-EmotionNet: Nested Mixture of Expert (NMoE) Adaptation of ECG-Foundation Model for Driver Emotion Recognition0
Unify and Anchor: A Context-Aware Transformer for Cross-Domain Time Series Forecasting0
DeRS: Towards Extremely Efficient Upcycled Mixture-of-Experts Models0
Explainable Classifier for Malignant Lymphoma Subtyping via Cell Graph and Image Fusion0
CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering0
CoSMoEs: Compact Sparse Mixture of Experts0
UniCodec: Unified Audio Codec with Single Domain-Adaptive Codebook0
Mixture of Experts for Recognizing Depression from Interview and Reading Tasks0
Mixture of Experts-augmented Deep Unfolding for Activity Detection in IRS-aided Systems0
Drop-Upcycling: Training Sparse Mixture of Experts with Partial Re-initialization0
OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment0
The Empirical Impact of Reducing Symmetries on the Performance of Deep Ensembles and MoE0
Evaluating Expert Contributions in a MoE LLM for Quiz-Based Tasks0
ENACT-Heart -- ENsemble-based Assessment Using CNN and Transformer on Heart Sounds0
BigMac: A Communication-Efficient Mixture-of-Experts Model Structure for Fast Training and Inference0
An Autonomous Network Orchestration Framework Integrating Large Language Models with Continual Reinforcement Learning0
Binary-Integer-Programming Based Algorithm for Expert Load Balancing in Mixture-of-Experts ModelsCode0
Tight Clusters Make Specialized ExpertsCode0
Ray-Tracing for Conditionally Activated Neural Networks0
Unraveling the Localized Latents: Learning Stratified Manifold Structures in LLM Embedding Space with Sparse Mixture-of-Experts0
Every Expert Matters: Towards Effective Knowledge Distillation for Mixture-of-Experts Language Models0
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs0
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines0
Connector-S: A Survey of Connectors in Multi-modal Large Language Models0
Fate: Fast Edge Inference of Mixture-of-Experts Models via Cross-Layer GateCode0
Mixture of Tunable Experts - Behavior Modification of DeepSeek-R1 at Inference Time0
ClimateLLM: Efficient Weather Forecasting via Frequency-Aware Large Language Models0
Probing Semantic Routing in Large Mixture-of-Expert Models0
Eidetic Learning: an Efficient and Provable Solution to Catastrophic ForgettingCode0
Mixture of Decoupled Message Passing Experts with Entropy Constraint for General Node Classification0
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition0
Memory Analysis on the Training Course of DeepSeek Models0
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks0
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing0
MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition0
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch PipelineCode0
Mol-MoE: Training Preference-Guided Routers for Molecule GenerationCode0
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving0
Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion0
Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts0
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient0
Mixture of neural operator experts for learning boundary conditions and model selection0
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach0
ReGNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction0
Brief analysis of DeepSeek R1 and it's implications for Generative AI0
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference0
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation0
CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling0
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs0
Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic PerspectiveCode0
Show:102550
← PrevPage 12 of 27Next →

No leaderboard results yet.