Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 1001–1050 of 1312 papers

Title	Date	Tasks	Status
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training	Jun 29, 2023	Continual LearningMixture-of-Experts	—Unverified
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills	Jun 28, 2023	Mixture-of-ExpertsNatural Language Understanding	—Unverified
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving	Jun 19, 2023	In-Context LearningLanguage Modeling	—Unverified
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings	Jun 14, 2023	DiversityFederated Learning	—Unverified
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce	Jun 8, 2023	Contrastive LearningMixture-of-Experts	—Unverified
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts	Jun 8, 2023	Language ModelingLanguage Modelling	CodeCode Available
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking	Jun 1, 2023	Dialogue State TrackingMixture-of-Experts	—Unverified
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment	Jun 1, 2023	BenchmarkingHate Speech Detection	CodeCode Available
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths	May 29, 2023	Image GenerationMixture-of-Experts	CodeCode Available
Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts	May 25, 2023	Mixture-of-Experts	—Unverified
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models	May 24, 2023	Mixture-of-ExpertsZero-shot Generalization	—Unverified
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding	May 23, 2023	Citation PredictionContrastive Learning	—Unverified
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model	May 23, 2023	AvgLanguage Modeling	—Unverified
Condensing Multilingual Knowledge with Lightweight Language-Specific Modules	May 23, 2023	Machine TranslationMixture-of-Experts	CodeCode Available
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis	May 22, 2023	Mixture-of-Experts	—Unverified
Lifelong Language Pretraining with Distribution-Specialized Experts	May 20, 2023	Lifelong learningMixture-of-Experts	—Unverified
Towards Convergence Rates for Parameter Estimation in Gaussian-gated Mixture of Experts	May 12, 2023	Ensemble LearningMixture-of-Experts	—Unverified
Locking and Quacking: Stacking Bayesian model predictions by log-pooling and superposition	May 12, 2023	Bayesian InferenceMixture-of-Experts	—Unverified
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception	May 10, 2023	Classificationimage-classification	—Unverified
Demystifying Softmax Gating Function in Gaussian Mixture of Experts	May 5, 2023	Mixture-of-Expertsparameter estimation	—Unverified
Steered Mixture-of-Experts Autoencoder Design for Real-Time Image Modelling and Denoising	May 5, 2023	DecoderDenoising	—Unverified
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity	May 3, 2023	Machine TranslationMixture-of-Experts	CodeCode Available
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism	Apr 22, 2023	AllMixture-of-Experts	—Unverified
Revisiting Single-gated Mixtures of Experts	Apr 11, 2023	Mixture-of-Experts	—Unverified
FlexMoE: Scaling Large-scale Sparse Pre-trained Model Training via Dynamic Device Placement	Apr 8, 2023	Mixture-of-ExpertsScheduling	—Unverified
Mixed Regression via Approximate Message Passing	Apr 5, 2023	DenoisingMixture-of-Experts	—Unverified
Steered Mixture of Experts Regression for Image Denoising with Multi-Model-Inference	Mar 30, 2023	DenoisingImage Denoising	—Unverified
Information Maximizing Curriculum: A Curriculum-Based Approach for Imitating Diverse Skills	Mar 27, 2023	Imitation LearningMixture-of-Experts	CodeCode Available
WM-MoE: Weather-aware Multi-scale Mixture-of-Experts for Blind Adverse Weather Removal	Mar 24, 2023	Autonomous DrivingContrastive Learning	—Unverified
Disguise without Disruption: Utility-Preserving Face De-Identification	Mar 23, 2023	De-identificationEnsemble Learning	—Unverified
Improving Transformer Performance for French Clinical Notes Classification Using Mixture of Experts on a Limited Dataset	Mar 22, 2023	Mixture-of-Expertstext-classification	—Unverified
HDformer: A Higher Dimensional Transformer for Diabetes Detection Utilizing Long Range Vascular Signals	Mar 17, 2023	Computational EfficiencyMixture-of-Experts	—Unverified
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning	Mar 15, 2023	Deep LearningGPU	—Unverified
Scaling Vision-Language Models with Sparse Mixture of Experts	Mar 13, 2023	Mixture-of-Experts	—Unverified
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training	Mar 11, 2023	Mixture-of-Experts	—Unverified
Towards MoE Deployment: Mitigating Inefficiencies in Mixture-of-Expert (MoE) Inference	Mar 10, 2023	CPUDecoder	—Unverified
Improving Expert Specialization in Mixture of Experts	Feb 28, 2023	Continual LearningMixture-of-Experts	—Unverified
Improved Training of Mixture-of-Experts Language GANs	Feb 23, 2023	Adversarial TextImage Generation	—Unverified
TMoE-P: Towards the Pareto Optimum for Multivariate Soft Sensors	Feb 21, 2023	Mixture-of-Experts	—Unverified
Massively Multilingual Shallow Fusion with Large Language Models	Feb 17, 2023	Automatic Speech RecognitionAutomatic Speech Recognition (ASR)	—Unverified
Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective	Feb 2, 2023	GPUMixture-of-Experts	—Unverified
Alternating Updates for Efficient Transformers	Jan 30, 2023	Mixture-of-Experts	—Unverified
PRUDEX-Compass: Towards Systematic Evaluation of Reinforcement Learning in Financial Markets	Jan 14, 2023	ManagementMixture-of-Experts	—Unverified
AdaEnsemble: Learning Adaptively Sparse Structured Ensemble Network for Click-Through Rate Prediction	Jan 6, 2023	Click-Through Rate PredictionMixture-of-Experts	—Unverified
Covariate-guided Bayesian mixture model for multivariate time series	Jan 3, 2023	Mixture-of-ExpertsTime Series	CodeCode Available
Semantic-Aware Dynamic Parameter for Video Inpainting Transformer	Jan 1, 2023	Mixture-of-ExpertsVideo Inpainting	—Unverified
Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners	Jan 1, 2023	Mixture-of-ExpertsMulti-Task Learning	—Unverified
AdaMV-MoE: Adaptive Multi-Task Vision Mixture-of-Experts	Jan 1, 2023	Instance SegmentationMixture-of-Experts	—Unverified
Memory-efficient NLLB-200: Language-specific Expert Pruning of a Massively Multilingual Machine Translation Model	Dec 19, 2022	GPUMachine Translation	—Unverified
MultiCoder: Multi-Programming-Lingual Pre-Training for Low-Resource Code Completion	Dec 19, 2022	Code CompletionMixture-of-Experts	—Unverified

Show:10 25 50

← PrevPage 21 of 27Next →

No leaderboard results yet.