SOTAVerified

Mixture-of-Experts

Papers

Showing 301350 of 1312 papers

TitleStatusHype
MoHAVE: Mixture of Hierarchical Audio-Visual Experts for Robust Speech Recognition0
Training Sparse Mixture Of Experts Text Embedding ModelsCode4
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks0
Memory Analysis on the Training Course of DeepSeek Models0
Jakiro: Boosting Speculative Decoding with Decoupled Multi-Head via MoECode1
MoETuner: Optimized Mixture of Expert Serving with Balanced Expert Placement and Token Routing0
MoEMba: A Mamba-based Mixture of Experts for High-Density EMG-based Hand Gesture Recognition0
Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch PipelineCode0
Mol-MoE: Training Preference-Guided Routers for Molecule GenerationCode0
Leveraging Pre-Trained Models for Multimodal Class-Incremental Learning under Adaptive Fusion0
Towards Foundational Models for Dynamical System Reconstruction: Hierarchical Meta-Learning via Mixture of Experts0
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving0
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient0
Mixture of neural operator experts for learning boundary conditions and model selection0
CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM InferenceCode1
Optimizing Robustness and Accuracy in Mixture of Experts: A Dual-Model Approach0
ReGNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction0
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference0
Brief analysis of DeepSeek R1 and it's implications for Generative AI0
CLIP-UP: A Simple and Efficient Mixture-of-Experts CLIP Training Recipe with Sparse Upcycling0
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation0
MergeME: Model Merging Techniques for Homogeneous and Heterogeneous MoEs0
UniGraph2: Learning a Unified Embedding Space to Bind Multimodal GraphsCode1
Distribution-aware Fairness Learning in Medical Image Segmentation From A Control-Theoretic PerspectiveCode0
Sigmoid Self-Attention has Lower Sample Complexity than Softmax Self-Attention: A Mixture-of-Experts Perspective0
PM-MOE: Mixture of Experts on Private Model Parameters for Personalized Federated LearningCode1
Pheromone-based Learning of Optimal Reasoning Paths0
Adaptive Prompt: Unlocking the Power of Visual Prompt Tuning0
MolGraph-xLSTM: A graph-based dual-level xLSTM framework with multi-head mixture-of-experts for enhanced molecular representation and interpretability0
Heuristic-Informed Mixture of Experts for Link Prediction in Multilayer Networks0
Free Agent in Agent-Based Mixture-of-Experts Generative AI Framework0
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow0
Static Batching of Irregular Workloads on GPUs: Framework and Application to Efficient MoE Model Inference0
ToMoE: Converting Dense Large Language Models to Mixture-of-Experts through Dynamic Structural Pruning0
FreqMoE: Enhancing Time Series Forecasting through Frequency Decomposition Mixture of ExpertsCode1
Each Rank Could be an Expert: Single-Ranked Mixture of Experts LoRA for Multi-Task Learning0
Mean-field limit from general mixtures of experts to quantum neural networks0
Hierarchical Time-Aware Mixture of Experts for Multi-Modal Sequential RecommendationCode1
Sparse Mixture-of-Experts for Non-Uniform Noise Reduction in MRI Images0
CSAOT: Cooperative Multi-Agent System for Active Object Tracking0
LLM4WM: Adapting LLM for Wireless Multi-Tasking0
UniUIR: Considering Underwater Image Restoration as An All-in-One Learner0
BLR-MoE: Boosted Language-Routing Mixture of Experts for Domain-Robust Multilingual E2E ASR0
Autonomy-of-Experts Models0
SCFCRC: Simultaneously Counteract Feature Camouflage and Relation Camouflage for Fraud Detection0
Modality Interactive Mixture-of-Experts for Fake News DetectionCode1
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models0
MoGERNN: An Inductive Traffic Predictor for Unobserved Locations in Dynamic Sensing NetworksCode1
Parameters vs FLOPs: Scaling Laws for Optimal Sparsity for Mixture-of-Experts Language Models0
FSMoE: A Flexible and Scalable Training System for Sparse Mixture-of-Experts Models0
Show:102550
← PrevPage 7 of 27Next →

No leaderboard results yet.