SOTAVerified

Mixture-of-Experts

Papers

Showing 401450 of 1312 papers

TitleStatusHype
CoLA: Collaborative Low-Rank AdaptationCode0
MoRE-Brain: Routed Mixture of Experts for Interpretable and Generalizable Cross-Subject fMRI Visual DecodingCode0
Not All Models Suit Expert Offloading: On Local Routing Consistency of Mixture-of-Expert ModelsCode0
Efficient Data Driven Mixture-of-Expert Extraction from Trained Networks0
Hunyuan-TurboS: Advancing Large Language Models through Mamba-Transformer Synergy and Adaptive Chain-of-Thought0
Towards Rehearsal-Free Continual Relation Extraction: Capturing Within-Task Variance with Adaptive PromptingCode0
Multimodal Cultural Safety: Evaluation Frameworks and Alignment StrategiesCode0
FuxiMT: Sparsifying Large Language Models for Chinese-Centric Multilingual Machine Translation0
StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning0
Multimodal Mixture of Low-Rank Experts for Sentiment Analysis and Emotion Recognition0
THOR-MoE: Hierarchical Task-Guided and Context-Responsive Routing for Neural Machine Translation0
EfficientLLM: Efficiency in Large Language Models0
Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training0
Balanced and Elastic End-to-end Training of Dynamic LLMs0
Scaling and Enhancing LLM-based AVSR: A Sparse Mixture of Projectors Approach0
CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via CompetitionCode0
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures0
Seeing the Unseen: How EMoE Unveils Bias in Text-to-Image Diffusion Models0
True Zero-Shot Inference of Dynamical Systems Preserving Long-Term Statistics0
MINGLE: Mixtures of Null-Space Gated Low-Rank Experts for Test-Time Continual Model Merging0
Improving Coverage in Combined Prediction Sets with Weighted p-values0
Multi-modal Collaborative Optimization and Expansion Network for Event-assisted Single-eye Expression RecognitionCode0
Model Merging in Pre-training of Large Language Models0
A Fast Kernel-based Conditional Independence test with Application to Causal Discovery0
On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating0
MegaScale-MoE: Large-Scale Communication-Efficient Training of Mixture-of-Experts Models in Production0
MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems0
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures0
PT-MoE: An Efficient Finetuning Framework for Integrating Mixture-of-Experts into Prompt TuningCode0
AM-Thinking-v1: Advancing the Frontier of Reasoning at 32B Scale0
PWC-MoE: Privacy-Aware Wireless Collaborative Mixture of Experts0
UMoE: Unifying Attention and FFN with Shared Experts0
FreqMoE: Dynamic Frequency Enhancement for Neural PDE Solvers0
Seed1.5-VL Technical Report0
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts0
QoS-Efficient Serving of Multiple Mixture-of-Expert LLMs Using Partial Runtime Reconfiguration0
FloE: On-the-Fly MoE Inference on Memory-constrained GPU0
Divide-and-Conquer: Cold-Start Bundle Recommendation via Mixture of Diffusion Experts0
SToLa: Self-Adaptive Touch-Language Framework with Tactile Commonsense Reasoning in Open-Ended Scenarios0
LLM-e Guess: Can LLMs Capabilities Advance Without Hardware Progress?Code0
Pangu Ultra MoE: How to Train Your Big MoE on Ascend NPUs0
Faster MoE LLM Inference for Extremely Large Models0
3D Gaussian Splatting Data Compression with Mixture of Priors0
STAR-Rec: Making Peace with Length Variance and Pattern Diversity in Sequential Recommendation0
Towards Smart Point-and-Shoot Photography0
Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques0
Finger Pose Estimation for Under-screen Fingerprint SensorCode0
Multimodal Deep Learning-Empowered Beam Prediction in Future THz ISAC Systems0
Perception-Informed Neural Networks: Beyond Physics-Informed Neural Networks0
CoCoAFusE: Beyond Mixtures of Experts via Model Fusion0
Show:102550
← PrevPage 9 of 27Next →

No leaderboard results yet.