Mixture-of-Experts

Papers

Recently Added Most Hyped Most Active Needs Verification Most Verified

Showing 901–950 of 1312 papers

Title	Date	Tasks	Status	Hype
Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy	Oct 2, 2023	Mixture-of-Experts	CodeCode Available	1
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection	Sep 26, 2023	Instance SegmentationMixture-of-Experts	CodeCode Available	1
LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models	Sep 25, 2023	GPUMixture-of-Experts	CodeCode Available	1
Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts	Sep 25, 2023	Density EstimationMixture-of-Experts	—Unverified	0
Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning	Sep 11, 2023	Mixture-of-Expertsparameter-efficient fine-tuning	CodeCode Available	2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts	Sep 8, 2023	Mixture-of-Experts	—Unverified	0
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis	Sep 7, 2023	Image GenerationMixture-of-Experts	CodeCode Available	1
Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives	Sep 1, 2023	Mixture-of-Experts	CodeCode Available	0
Task-Based MoE for Multitask Multilingual Machine Translation	Aug 30, 2023	Machine TranslationMixture-of-Experts	—Unverified	0
SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget	Aug 29, 2023	Mixture-of-Expertsobject-detection	—Unverified	0
Fast Feedforward Networks	Aug 28, 2023	Mixture-of-Experts	CodeCode Available	2
Motion In-Betweening with Phase Manifolds	Aug 24, 2023	Mixture-of-Expertsmotion in-betweening	CodeCode Available	2
Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference	Aug 23, 2023	CPUGPU	CodeCode Available	1
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE	Aug 23, 2023	Image-text matchingImage-text Retrieval	—Unverified	0
Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts	Aug 22, 2023	Mixture-of-ExpertsNeRF	CodeCode Available	1
Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection	Aug 17, 2023	Anomaly DetectionMixture-of-Experts	CodeCode Available	0
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs	Aug 16, 2023	GPUMixture-of-Experts	—Unverified	0
HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion	Aug 12, 2023	AttributeKnowledge Graph Completion	CodeCode Available	1
Experts Weights Averaging: A New General Training Scheme for Vision Transformers	Aug 11, 2023	Mixture-of-Experts	—Unverified	0
A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction	Aug 1, 2023	Mixture-of-ExpertsPosition	—Unverified	0
Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving	Jul 30, 2023	Autonomous DrivingMixture-of-Experts	—Unverified	0
TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts	Jul 28, 2023	Long-range modelingMixture-of-Experts	CodeCode Available	2
MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models	Jul 18, 2023	Language ModellingMixture-of-Experts	CodeCode Available	1
Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform	Jul 11, 2023	Continual LearningMixture-of-Experts	CodeCode Available	0
Bidirectional Attention as a Mixture of Continuous Word Experts	Jul 8, 2023	Language ModellingMixture-of-Experts	CodeCode Available	0
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training	Jun 29, 2023	Continual LearningMixture-of-Experts	—Unverified	0
Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model	Jun 28, 2023	HallucinationKnowledge Graphs	CodeCode Available	5
SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills	Jun 28, 2023	Mixture-of-ExpertsNatural Language Understanding	—Unverified	0
JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving	Jun 19, 2023	In-Context LearningLanguage Modeling	—Unverified	0
Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluation	Jun 15, 2023	Image Quality AssessmentImage Super-Resolution	CodeCode Available	1
Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings	Jun 14, 2023	DiversityFederated Learning	—Unverified	0
ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer	Jun 10, 2023	Efficient ViTsMixture-of-Experts	CodeCode Available	1
Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce	Jun 8, 2023	Contrastive LearningMixture-of-Experts	—Unverified	0
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts	Jun 8, 2023	Language ModelingLanguage Modelling	CodeCode Available	0
ModuleFormer: Modularity Emerges from Mixture-of-Experts	Jun 7, 2023	Language ModellingLightweight Deployment	CodeCode Available	2
Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks	Jun 7, 2023	Mixture-of-Experts	CodeCode Available	1
COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search	Jun 5, 2023	Language ModelingLanguage Modelling	CodeCode Available	1
Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment	Jun 1, 2023	BenchmarkingHate Speech Detection	CodeCode Available	0
Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking	Jun 1, 2023	Dialogue State TrackingMixture-of-Experts	—Unverified	0
Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts	May 30, 2023	CPUGPU	CodeCode Available	1
RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths	May 29, 2023	Image GenerationMixture-of-Experts	CodeCode Available	0
Emergent Modularity in Pre-trained Transformers	May 28, 2023	Mixture-of-Experts	CodeCode Available	1
Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts	May 25, 2023	Mixture-of-Experts	—Unverified	0
Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models	May 24, 2023	Mixture-of-ExpertsZero-shot Generalization	—Unverified	0
Condensing Multilingual Knowledge with Lightweight Language-Specific Modules	May 23, 2023	Machine TranslationMixture-of-Experts	CodeCode Available	0
Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model	May 23, 2023	AvgLanguage Modeling	—Unverified	0
Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding	May 23, 2023	Citation PredictionContrastive Learning	—Unverified	0
To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis	May 22, 2023	Mixture-of-Experts	—Unverified	0
Lifelong Language Pretraining with Distribution-Specialized Experts	May 20, 2023	Lifelong learningMixture-of-Experts	—Unverified	0
Lifting the Curse of Capacity Gap in Distilling Language Models	May 20, 2023	Knowledge DistillationMixture-of-Experts	CodeCode Available	1

Show:10 25 50

← PrevPage 19 of 27Next →

No leaderboard results yet.