| LOLA -- An Open-Source Massively Multilingual Large Language Model | Sep 17, 2024 | DiversityLanguage Modeling | CodeCode Available | 1 |
| LPT++: Efficient Training on Mixture of Long-tailed Experts | Sep 17, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | —Unverified | 0 |
| Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression | Sep 16, 2024 | DenoisingMixture-of-Experts | —Unverified | 0 |
| Integrating AI's Carbon Footprint into Risk Management Frameworks: Strategies and Tools for Sustainable Compliance in Banking Sector | Sep 15, 2024 | Cloud ComputingManagement | —Unverified | 0 |
| MiniDrive: More Efficient Vision-Language Models with Multi-Level 2D Features as Text Tokens for Autonomous Driving | Sep 11, 2024 | Autonomous DrivingFeature Engineering | CodeCode Available | 2 |
| DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models | Sep 10, 2024 | Mixture-of-Experts | —Unverified | 0 |
| VE: Modeling Multivariate Time Series Correlation with Variate Embedding | Sep 10, 2024 | Mixture-of-ExpertsMultivariate Time Series Forecasting | CodeCode Available | 0 |
| STUN: Structured-Then-Unstructured Pruning for Scalable MoE Pruning | Sep 10, 2024 | GSM8KMixture-of-Experts | —Unverified | 0 |
| M3-Jepa: Multimodal Alignment via Multi-directional MoE based on the JEPA framework | Sep 9, 2024 | Computational EfficiencyCross-Modal Retrieval | CodeCode Available | 1 |
| Adapted-MoE: Mixture of Experts with Test-Time Adaption for Anomaly Detection | Sep 9, 2024 | Anomaly DetectionMixture-of-Experts | —Unverified | 0 |
| Interpretable mixture of experts for time series prediction under recurrent and non-recurrent conditions | Sep 5, 2024 | Mixture-of-ExpertsTime Series | —Unverified | 0 |
| Pluralistic Salient Object Detection | Sep 4, 2024 | Mixture-of-ExpertsObject | —Unverified | 0 |
| Configurable Foundation Models: Building LLMs from a Modular Perspective | Sep 4, 2024 | Computational EfficiencyMixture-of-Experts | —Unverified | 0 |
| Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model | Sep 3, 2024 | Language IdentificationMixture-of-Experts | —Unverified | 0 |
| OLMoE: Open Mixture-of-Experts Language Models | Sep 3, 2024 | Language ModelingLanguage Modelling | CodeCode Available | 4 |
| Duplex: A Device for Large Language Models with Mixture of Experts, Grouped Query Attention, and Continuous Batching | Sep 2, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Beyond Parameter Count: Implicit Bias in Soft Mixture of Experts | Sep 2, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Gradient-free variational learning with conditional mixture networks | Aug 29, 2024 | Computational EfficiencyMixture-of-Experts | CodeCode Available | 1 |
| Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts | Aug 28, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Nexus: Specialization meets Adaptability for Efficiently Training Mixture of Experts | Aug 28, 2024 | Mixture-of-Experts | —Unverified | 0 |
| LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation | Aug 28, 2024 | Computational EfficiencyHallucination | CodeCode Available | 3 |
| Parameter-Efficient Quantized Mixture-of-Experts Meets Vision-Language Instruction Tuning for Semiconductor Electron Micrograph Analysis | Aug 27, 2024 | Instruction FollowingLanguage Modeling | —Unverified | 0 |
| Advancing Enterprise Spatio-Temporal Forecasting Applications: Data Mining Meets Instruction Tuning of Language Models For Multi-modal Time Series Analysis in Low-Resource Settings | Aug 24, 2024 | Decision MakingMixture-of-Experts | —Unverified | 0 |
| The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities | Aug 23, 2024 | Computational EfficiencyInference Optimization | —Unverified | 0 |
| La-SoftMoE CLIP for Unified Physical-Digital Face Attack Detection | Aug 23, 2024 | Mixture-of-Experts | —Unverified | 0 |
| Multi-Treatment Multi-Task Uplift Modeling for Enhancing User Growth | Aug 23, 2024 | Causal InferenceMixture-of-Experts | —Unverified | 0 |
| DutyTTE: Deciphering Uncertainty in Origin-Destination Travel Time Estimation | Aug 23, 2024 | Deep Reinforcement LearningMixture-of-Experts | CodeCode Available | 0 |
| SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging | Aug 22, 2024 | DiversityMixture-of-Experts | —Unverified | 0 |
| Jamba-1.5: Hybrid Transformer-Mamba Models at Scale | Aug 22, 2024 | ChatbotInstruction Following | CodeCode Available | 5 |
| Improving Factuality in Large Language Models via Decoding-Time Hallucinatory and Truthful Comparators | Aug 22, 2024 | HallucinationMixture-of-Experts | CodeCode Available | 0 |
| MoE-LPR: Multilingual Extension of Large Language Models through Mixture-of-Experts with Language Priors Routing | Aug 21, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| FedMoE: Personalized Federated Learning via Heterogeneous Mixture of Experts | Aug 21, 2024 | Federated LearningHeuristic Search | —Unverified | 0 |
| KAN4TSF: Are KAN and KAN-based models Effective for Time Series Forecasting? | Aug 21, 2024 | Mixture-of-ExpertsTime Series | CodeCode Available | 2 |
| HMoE: Heterogeneous Mixture of Experts for Language Modeling | Aug 20, 2024 | Computational EfficiencyLanguage Modeling | —Unverified | 0 |
| Navigating Spatio-Temporal Heterogeneity: A Graph Transformer Approach for Traffic Forecasting | Aug 20, 2024 | AttributeMixture-of-Experts | CodeCode Available | 1 |
| AnyGraph: Graph Foundation Model in the Wild | Aug 20, 2024 | Graph LearningMixture-of-Experts | CodeCode Available | 3 |
| AdapMoE: Adaptive Sensitivity-based Expert Gating and Management for Efficient MoE Inference | Aug 19, 2024 | ManagementMixture-of-Experts | CodeCode Available | 1 |
| A Unified Framework for Iris Anti-Spoofing: Introducing IrisGeneral Dataset and Masked-MoE Method | Aug 19, 2024 | Iris RecognitionMixture-of-Experts | —Unverified | 0 |
| Customizing Language Models with Instance-wise LoRA for Sequential Recommendation | Aug 19, 2024 | Mixture-of-ExpertsMulti-Task Learning | CodeCode Available | 1 |
| FEDKIM: Adaptive Federated Knowledge Injection into Medical Foundation Models | Aug 17, 2024 | Federated LearningMixture-of-Experts | CodeCode Available | 0 |
| Integrating Multi-view Analysis: Multi-view Mixture-of-Expert for Textual Personality Detection | Aug 16, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| FactorLLM: Factorizing Knowledge via Mixture of Experts for Large Language Models | Aug 15, 2024 | Mixture-of-Experts | CodeCode Available | 0 |
| BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts | Aug 15, 2024 | Mixture-of-Experts | —Unverified | 0 |
| A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning | Aug 13, 2024 | Mixture-of-ExpertsSurvey | —Unverified | 0 |
| AquilaMoE: Efficient Training for MoE Models with Scale-Up and Scale-Out Strategies | Aug 13, 2024 | Language ModellingMixture-of-Experts | CodeCode Available | 1 |
| Layerwise Recurrent Router for Mixture-of-Experts | Aug 13, 2024 | AttributeMixture-of-Experts | CodeCode Available | 1 |
| HoME: Hierarchy of Multi-Gate Experts for Multi-Task Learning at Kuaishou | Aug 10, 2024 | Mixture-of-ExpertsMulti-Task Learning | —Unverified | 0 |
| Understanding the Performance and Estimating the Cost of LLM Fine-Tuning | Aug 8, 2024 | GPUMixture-of-Experts | CodeCode Available | 0 |
| LaDiMo: Layer-wise Distillation Inspired MoEfier | Aug 8, 2024 | Knowledge DistillationMixture-of-Experts | —Unverified | 0 |
| MoC-System: Efficient Fault Tolerance for Sparse Mixture-of-Experts Model Training | Aug 8, 2024 | Mixture-of-Experts | —Unverified | 0 |