SOTAVerified

Mixture-of-Experts

Papers

Showing 9761000 of 1312 papers

TitleStatusHype
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach0
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts0
Manifold-Preserving Transformers are Effective for Short-Long Range EncodingCode0
Direct Neural Machine Translation with Task-level Mixture of Experts models0
Multi-view Contrastive Learning for Entity Typing over Knowledge GraphsCode0
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer0
Adaptive Gating in Mixture-of-Experts based Language Models0
Beyond the Typical: Modeling Rare Plausible Patterns in Chemical Reactions by Leveraging Sequential Mixture-of-Experts0
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts ConversionCode0
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition0
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness0
FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion ModelsCode0
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts0
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts0
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectivesCode0
Task-Based MoE for Multitask Multilingual Machine Translation0
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget0
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE0
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly DetectionCode0
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs0
Experts Weights Averaging: A New General Training Scheme for Vision Transformers0
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction0
Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving0
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing PlatformCode0
Bidirectional Attention as a Mixture of Continuous Word ExpertsCode0
Show:102550
← PrevPage 40 of 53Next →

No leaderboard results yet.