SOTAVerified

Mixture-of-Experts

Papers

Showing 251300 of 1312 papers

TitleStatusHype
Multi-Task Reinforcement Learning with Mixture of Orthogonal ExpertsCode1
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasetsCode1
SiDA-MoE: Sparsity-Inspired Data-Aware Serving for Efficient and Scalable Large Mixture-of-Experts ModelsCode1
SteloCoder: a Decoder-Only LLM for Multi-Language to Python Code TranslationCode1
Image Super-resolution Via Latent Diffusion: A Sampling-space Mixture Of Experts And Frequency-augmented Decoder ApproachCode1
Merging Experts into One: Improving Computational Efficiency of Mixture of ExpertsCode1
Sparse Universal TransformerCode1
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyCode1
MoCaE: Mixture of Calibrated Experts Significantly Improves Object DetectionCode1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language ModelsCode1
Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisCode1
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert InferenceCode1
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-ExpertsCode1
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph CompletionCode1
MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language ModelsCode1
Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluationCode1
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision TransformerCode1
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural NetworksCode1
COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local SearchCode1
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-ExpertsCode1
Emergent Modularity in Pre-trained TransformersCode1
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data IntegrationCode1
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge ExcavationCode1
Re-IQA: Unsupervised Learning for Image Quality Assessment in the WildCode1
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question AnsweringCode1
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable TransformersCode1
Mixture of Decision Trees for Interpretable Machine LearningCode1
Spatial Mixture-of-ExpertsCode1
PAD-Net: An Efficient Framework for Dynamic NetworksCode1
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-designCode1
AutoMoE: Heterogeneous Mixture-of-Experts with Adaptive Computation for Efficient Neural Machine TranslationCode1
Mixture of Attention Heads: Selecting Attention Heads Per TokenCode1
Meta-DMoE: Adapting to Domain Shift by Meta-Distillation from Mixture-of-ExpertsCode1
Mask and Reason: Pre-Training Knowledge Graph Transformers for Complex Logical QueriesCode1
Towards Understanding Mixture of Experts in Deep LearningCode1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-ExpertsCode1
Sparse Mixture-of-Experts are Domain Generalizable LearnersCode1
Patcher: Patch Transformers with Mixture of Experts for Precise Medical Image SegmentationCode1
Addressing Confounding Feature Issue for Causal RecommendationCode1
StableMoE: Stable Routing Strategy for Mixture of ExpertsCode1
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided AdaptationCode1
3M: Multi-loss, Multi-path and Multi-level Neural Networks for speech recognitionCode1
Efficient and Degradation-Adaptive Network for Real-World Image Super-ResolutionCode1
SummaReranker: A Multi-Task Mixture-of-Experts Re-ranking Framework for Abstractive SummarizationCode1
Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language ModelsCode1
EvoMoE: An Evolutional Mixture-of-Experts Training Framework via Dense-To-Sparse GateCode1
Mimic Embedding via Adaptive Aggregation: Learning Generalizable Person Re-identificationCode1
Unsupervised Foreground Extraction via Deep Region CompetitionCode1
HydraSum: Disentangling Stylistic Features in Text Summarization using Multi-Decoder ModelsCode1
Show:102550
← PrevPage 6 of 27Next →

No leaderboard results yet.