| Jamba: A Hybrid Transformer-Mamba Language Model | Mar 28, 2024 | GPULanguage Modeling | CodeCode Available | 0 |
| Generalization Error Analysis for Sparse Mixture-of-Experts: A Preliminary Study | Mar 26, 2024 | Learning TheoryMixture-of-Experts | —Unverified | 0 |
| Multi-Task Dense Prediction via Mixture of Low-Rank Experts | Mar 26, 2024 | DecoderMixture-of-Experts | CodeCode Available | 2 |
| GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot | Mar 20, 2024 | Mixture-of-ExpertsMulti-Task Learning | —Unverified | 0 |
| DESIRE-ME: Domain-Enhanced Supervised Information REtrieval using Mixture-of-Experts | Mar 20, 2024 | Information RetrievalMixture-of-Experts | CodeCode Available | 0 |
| Task-Customized Mixture of Adapters for General Image Fusion | Mar 19, 2024 | Mixture-of-Experts | CodeCode Available | 2 |
| Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Mar 18, 2024 | Mixture-of-Expertsparameter-efficient fine-tuning | CodeCode Available | 2 |
| Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters | Mar 18, 2024 | Continual LearningIncremental Learning | CodeCode Available | 3 |
| Skeleton-Based Human Action Recognition with Noisy Labels | Mar 15, 2024 | Action RecognitionDenoising | CodeCode Available | 0 |
| Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts | Mar 14, 2024 | DenoisingMixture-of-Experts | CodeCode Available | 2 |