SOTAVerified

Mixture-of-Experts

Papers

Showing 876900 of 1312 papers

TitleStatusHype
Multi-Task Reinforcement Learning with Mixture of Orthogonal ExpertsCode1
Memory Augmented Language Models through Mixture of Word Experts0
Intentional Biases in LLM Responses0
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasetsCode1
CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval0
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoECode0
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach0
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsCode1
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter ModelsCode2
Mixture of Tokens: Continuous MoE through Cross-Example AggregationCode2
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code TranslationCode1
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts0
Manifold-Preserving Transformers are Effective for Short-Long Range EncodingCode0
Direct Neural Machine Translation with Task-level Mixture of Experts models0
Multi-view Contrastive Learning for Entity Typing over Knowledge GraphsCode0
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder ApproachCode1
Merging Experts into One: Improving Computational Efficiency of Mixture of ExpertsCode1
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer0
Adaptive Gating in Mixture-of-Experts based Language Models0
Sparse Universal TransformerCode1
Beyond the Typical: Modeling Rare Plausible Patterns in Chemical Reactions by Leveraging Sequential Mixture-of-Experts0
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts ConversionCode0
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition0
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness0
FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion ModelsCode0
Show:102550
← PrevPage 36 of 53Next →

No leaderboard results yet.