SOTAVerified

Mixture-of-Experts

Papers

Showing 851900 of 1312 papers

TitleStatusHype
k-Winners-Take-All Ensemble Neural NetworkCode0
Fast Inference of Mixture-of-Experts Language Models with OffloadingCode4
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation0
Agent4Ranking: Semantic Robust Ranking via Personalized Query Rewriting Using Multi-agent LLM0
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-ScalingCode3
FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and EditingCode1
Aurora:Activating Chinese chat capability for Mixtral-8x7B sparse Mixture-of-Experts through Instruction-TuningCode2
Generator Assisted Mixture of Experts For Feature Acquisition in Batch0
Mixture of Cluster-conditional LoRA Experts for Vision-language Instruction Tuning0
From Google Gemini to OpenAI Q* (Q-Star): A Survey of Reshaping the Generative Artificial Intelligence (AI) Research Landscape0
When Parameter-efficient Tuning Meets General-purpose Vision-language ModelsCode1
LoRAMoE: Alleviate World Knowledge Forgetting in Large Language Models via MoE-Style PluginCode2
Online Action Recognition for Human Risk Prediction with Anticipated Haptic Alert via WearablesCode0
Training of Neural Networks with Uncertain Data: A Mixture of Experts Approach0
SwitchHead: Accelerating Transformers with Mixture-of-Experts AttentionCode1
Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-ExpertsCode1
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of ExpertsCode1
Mixture-of-Linear-Experts for Long-term Time Series ForecastingCode1
GraphMETRO: Mitigating Complex Graph Distribution Shifts via Mixture of Aligned ExpertsCode1
MoE-AMC: Enhancing Automatic Modulation Classification Performance Using Mixture-of-Experts0
MoEC: Mixture of Experts Implicit Neural Compression0
Language-driven All-in-one Adverse Weather Removal0
Omni-SMoLA: Boosting Generalist Multimodal Models with Soft Mixture of Low-rank Experts0
HOMOE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts0
Efficient Model Agnostic Approach for Implicit Neural Representation Based Arbitrary-Scale Image Super-Resolution0
Multi-Task Reinforcement Learning with Mixture of Orthogonal ExpertsCode1
Memory Augmented Language Models through Mixture of Word Experts0
Intentional Biases in LLM Responses0
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasetsCode1
CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval0
Octavius: Mitigating Task Interference in MLLMs via LoRA-MoECode0
Mixture-of-Experts for Open Set Domain Adaptation: A Dual-Space Detection Approach0
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsCode1
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter ModelsCode2
Mixture of Tokens: Continuous MoE through Cross-Example AggregationCode2
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code TranslationCode1
A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts0
Manifold-Preserving Transformers are Effective for Short-Long Range EncodingCode0
Direct Neural Machine Translation with Task-level Mixture of Experts models0
Multi-view Contrastive Learning for Entity Typing over Knowledge GraphsCode0
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder ApproachCode1
Merging Experts into One: Improving Computational Efficiency of Mixture of ExpertsCode1
Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer0
Adaptive Gating in Mixture-of-Experts based Language Models0
Sparse Universal TransformerCode1
Beyond the Typical: Modeling Rare Plausible Patterns in Chemical Reactions by Leveraging Sequential Mixture-of-Experts0
Exploiting Activation Sparsity with Dense to Dynamic-k Mixture-of-Experts ConversionCode0
Reinforcement Learning-based Mixture of Vision Transformers for Video Violence Recognition0
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness0
FT-Shield: A Watermark Against Unauthorized Fine-tuning in Text-to-Image Diffusion ModelsCode0
Show:102550
← PrevPage 18 of 27Next →

No leaderboard results yet.