SOTAVerified

Mixture-of-Experts

Papers

Showing 901950 of 1312 papers

TitleStatusHype
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing PolicyCode1
MoCaE: Mixture of Calibrated Experts Significantly Improves Object DetectionCode1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language ModelsCode1
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts0
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction TuningCode2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts0
Exploring Sparse MoE in GANs for Text-conditioned Image SynthesisCode1
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectivesCode0
Task-Based MoE for Multitask Multilingual Machine Translation0
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget0
Fast Feedforward NetworksCode2
Motion In-Betweening with Phase ManifoldsCode2
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert InferenceCode1
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE0
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-ExpertsCode1
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly DetectionCode0
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs0
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph CompletionCode1
Experts Weights Averaging: A New General Training Scheme for Vision Transformers0
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction0
Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving0
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-ExpertsCode2
MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language ModelsCode1
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing PlatformCode0
Bidirectional Attention as a Mixture of Continuous Word ExpertsCode0
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training0
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language ModelCode5
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving0
Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluationCode1
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings0
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision TransformerCode1
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce0
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-ExpertsCode0
ModuleFormer: Modularity Emerges from Mixture-of-ExpertsCode2
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural NetworksCode1
COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local SearchCode1
Revisiting Hate Speech Benchmarks: From Data Curation to System DeploymentCode0
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking0
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-ExpertsCode1
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion PathsCode0
Emergent Modularity in Pre-trained TransformersCode1
Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts0
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models0
Condensing Multilingual Knowledge with Lightweight Language-Specific ModulesCode0
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model0
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding0
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis0
Lifelong Language Pretraining with Distribution-Specialized Experts0
Lifting the Curse of Capacity Gap in Distilling Language ModelsCode1
Show:102550
← PrevPage 19 of 27Next →

No leaderboard results yet.