SOTAVerified

Mixture-of-Experts

Papers

Showing 101150 of 1312 papers

TitleStatusHype
Superposition in Transformers: A Novel Way of Building Mixture of ExpertsCode2
I2MoE: Interpretable Multimodal Interaction-aware Mixture-of-ExpertsCode2
Multi-Task Dense Prediction via Mixture of Low-Rank ExpertsCode2
XTrack: Multimodal Training Boosts RGB-X Video Object TrackersCode2
Motion In-Betweening with Phase ManifoldsCode2
Efficiently Democratizing Medical LLMs for 50 Languages via a Mixture of Language Family ExpertsCode2
Monet: Mixture of Monosemantic Experts for TransformersCode2
No Language Left Behind: Scaling Human-Centered Machine TranslationCode2
MoFE-Time: Mixture of Frequency Domain Experts for Time-Series Forecasting ModelsCode2
Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer ModelsCode2
Med-MoE: Mixture of Domain-Specific Experts for Lightweight Medical Vision-Language ModelsCode2
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision TasksCode2
A Closer Look into Mixture-of-Experts in Large Language ModelsCode2
Dynamic Tuning Towards Parameter and Inference Efficiency for ViT AdaptationCode2
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language ModelsCode2
Delta Decompression for MoE-based LLMs CompressionCode2
DeMo: Decoupled Feature-Based Mixture of Experts for Multi-Modal Object Re-IdentificationCode2
Linear-MoE: Linear Sequence Modeling Meets Mixture-of-ExpertsCode2
Mixture of Lookup ExpertsCode2
CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-ExpertsCode2
Demystifying the Compression of Mixture-of-Experts Through a Unified FrameworkCode2
Mixture of Tokens: Continuous MoE through Cross-Example AggregationCode2
CNMBERT: A Model for Converting Hanyu Pinyin Abbreviations to Chinese CharactersCode2
MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous DrivingCode2
MC-MoE: Mixture Compressor for Mixture-of-Experts LLMs Gains MoreCode2
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet UpcyclingCode2
MDFEND: Multi-domain Fake News DetectionCode2
Mixture of A Million ExpertsCode2
ModuleFormer: Modularity Emerges from Mixture-of-ExpertsCode2
Object Detection using Event Camera: A MoE Heat Conduction based Detector and A New Benchmark DatasetCode2
Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-ExpertsCode2
M4: Multi-Proxy Multi-Gate Mixture of Experts Network for Multiple Instance Learning in Histopathology Image AnalysisCode1
M^3ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-designCode1
M^4oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of ExpertsCode1
M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation FrameworkCode1
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze DatasetCode1
M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA frameworkCode1
LOLA -- An Open-Source Massively Multilingual Large Language ModelCode1
LLMBind: A Unified Modality-Task Integration FrameworkCode1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language ModelsCode1
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge ExcavationCode1
Making Neural Networks Interpretable with Attribution: Application to Implicit Signals PredictionCode1
LITE: Modeling Environmental Ecosystems with Multimodal Large Language ModelsCode1
AlphaLoRA: Assigning LoRA Experts Based on Layer Training QualityCode1
A Time Series is Worth Five Experts: Heterogeneous Mixture of Experts for Traffic Flow PredictionCode1
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
Learning to Skip the Middle Layers of TransformersCode1
Learning Soccer Juggling Skills with Layer-wise Mixture-of-ExpertsCode1
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-ExpertsCode1
Parameter Efficient Adaptation for Image Restoration with Heterogeneous Mixture-of-ExpertsCode1
Show:102550
← PrevPage 3 of 27Next →

No leaderboard results yet.