| Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy | Oct 2, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection | Sep 26, 2023 | Instance SegmentationMixture-of-Experts | CodeCode Available | 1 |
| LLMCarbon: Modeling the end-to-end Carbon Footprint of Large Language Models | Sep 25, 2023 | GPUMixture-of-Experts | CodeCode Available | 1 |
| Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts | Sep 25, 2023 | Density EstimationMixture-of-Experts | —Unverified | 0 |
| Pushing Mixture of Experts to the Limit: Extremely Parameter Efficient MoE for Instruction Tuning | Sep 11, 2023 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | Sep 8, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis | Sep 7, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 1 |
| Learning multi-modal generative models with permutation-invariant encoders and tighter variational objectives | Sep 1, 2023 | Mixture-of-Experts | CodeCode Available | 0 |
| Task-Based MoE for Multitask Multilingual Machine Translation | Aug 30, 2023 | Machine TranslationMixture-of-Experts | —Unverified | 0 |
| SwapMoE: Serving Off-the-shelf MoE-based Large Language Models with Tunable Memory Budget | Aug 29, 2023 | Mixture-of-Expertsobject-detection | —Unverified | 0 |
| Fast Feedforward Networks | Aug 28, 2023 | Mixture-of-Experts | CodeCode Available | 2 |
| Motion In-Betweening with Phase Manifolds | Aug 24, 2023 | Mixture-of-Expertsmotion in-betweening | CodeCode Available | 2 |
| Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference | Aug 23, 2023 | CPUGPU | CodeCode Available | 1 |
| EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE | Aug 23, 2023 | Image-text matchingImage-text Retrieval | —Unverified | 0 |
| Enhancing NeRF akin to Enhancing LLMs: Generalizable NeRF Transformer with Mixture-of-View-Experts | Aug 22, 2023 | Mixture-of-ExpertsNeRF | CodeCode Available | 1 |
| Beyond Sharing: Conflict-Aware Multivariate Time Series Anomaly Detection | Aug 17, 2023 | Anomaly DetectionMixture-of-Experts | CodeCode Available | 0 |
| FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs | Aug 16, 2023 | GPUMixture-of-Experts | —Unverified | 0 |
| HyperFormer: Enhancing Entity and Relation Interaction for Hyper-Relational Knowledge Graph Completion | Aug 12, 2023 | AttributeKnowledge Graph Completion | CodeCode Available | 1 |
| Experts Weights Averaging: A New General Training Scheme for Vision Transformers | Aug 11, 2023 | Mixture-of-Experts | —Unverified | 0 |
| A Novel Temporal Multi-Gate Mixture-of-Experts Approach for Vehicle Trajectory and Driving Intention Prediction | Aug 1, 2023 | Mixture-of-ExpertsPosition | —Unverified | 0 |
| Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving | Jul 30, 2023 | Autonomous DrivingMixture-of-Experts | —Unverified | 0 |
| TaskExpert: Dynamically Assembling Multi-Task Representations with Memorial Mixture-of-Experts | Jul 28, 2023 | Long-range modelingMixture-of-Experts | CodeCode Available | 2 |
| MLP Fusion: Towards Efficient Fine-tuning of Dense and Mixture-of-Experts Language Models | Jul 18, 2023 | Language ModellingMixture-of-Experts | CodeCode Available | 1 |
| Domain-Agnostic Neural Architecture for Class Incremental Continual Learning in Document Processing Platform | Jul 11, 2023 | Continual LearningMixture-of-Experts | CodeCode Available | 0 |
| Bidirectional Attention as a Mixture of Continuous Word Experts | Jul 8, 2023 | Language ModellingMixture-of-Experts | CodeCode Available | 0 |
| An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training | Jun 29, 2023 | Continual LearningMixture-of-Experts | —Unverified | 0 |
| Chatlaw: A Multi-Agent Collaborative Legal Assistant with Knowledge Graph Enhanced Mixture-of-Experts Large Language Model | Jun 28, 2023 | HallucinationKnowledge Graphs | CodeCode Available | 5 |
| SkillNet-X: A Multilingual Multitask Model with Sparsely Activated Skills | Jun 28, 2023 | Mixture-of-ExpertsNatural Language Understanding | —Unverified | 0 |
| JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving | Jun 19, 2023 | In-Context LearningLanguage Modeling | —Unverified | 0 |
| Deep learning techniques for blind image super-resolution: A high-scale multi-domain perspective evaluation | Jun 15, 2023 | Image Quality AssessmentImage Super-Resolution | CodeCode Available | 1 |
| Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings | Jun 14, 2023 | DiversityFederated Learning | —Unverified | 0 |
| ShiftAddViT: Mixture of Multiplication Primitives Towards Efficient Vision Transformer | Jun 10, 2023 | Efficient ViTsMixture-of-Experts | CodeCode Available | 1 |
| Attention Weighted Mixture of Experts with Contrastive Learning for Personalized Ranking in E-commerce | Jun 8, 2023 | Contrastive LearningMixture-of-Experts | —Unverified | 0 |
| Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts | Jun 8, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 0 |
| ModuleFormer: Modularity Emerges from Mixture-of-Experts | Jun 7, 2023 | Language ModellingLightweight Deployment | CodeCode Available | 2 |
| Patch-level Routing in Mixture-of-Experts is Provably Sample-efficient for Convolutional Neural Networks | Jun 7, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| COMET: Learning Cardinality Constrained Mixture of Experts with Trees and Local Search | Jun 5, 2023 | Language ModelingLanguage Modelling | CodeCode Available | 1 |
| Revisiting Hate Speech Benchmarks: From Data Curation to System Deployment | Jun 1, 2023 | BenchmarkingHate Speech Detection | CodeCode Available | 0 |
| Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking | Jun 1, 2023 | Dialogue State TrackingMixture-of-Experts | —Unverified | 0 |
| Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Experts | May 30, 2023 | CPUGPU | CodeCode Available | 1 |
| RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths | May 29, 2023 | Image GenerationMixture-of-Experts | CodeCode Available | 0 |
| Emergent Modularity in Pre-trained Transformers | May 28, 2023 | Mixture-of-Experts | CodeCode Available | 1 |
| Modeling Task Relationships in Multi-variate Soft Sensor with Balanced Mixture-of-Experts | May 25, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Mixture-of-Experts Meets Instruction Tuning:A Winning Combination for Large Language Models | May 24, 2023 | Mixture-of-ExpertsZero-shot Generalization | —Unverified | 0 |
| Condensing Multilingual Knowledge with Lightweight Language-Specific Modules | May 23, 2023 | Machine TranslationMixture-of-Experts | CodeCode Available | 0 |
| Towards A Unified View of Sparse Feed-Forward Network in Pretraining Large Language Model | May 23, 2023 | AvgLanguage Modeling | —Unverified | 0 |
| Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding | May 23, 2023 | Citation PredictionContrastive Learning | —Unverified | 0 |
| To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis | May 22, 2023 | Mixture-of-Experts | —Unverified | 0 |
| Lifelong Language Pretraining with Distribution-Specialized Experts | May 20, 2023 | Lifelong learningMixture-of-Experts | —Unverified | 0 |
| Lifting the Curse of Capacity Gap in Distilling Language Models | May 20, 2023 | Knowledge DistillationMixture-of-Experts | CodeCode Available | 1 |