| A Mixture of h - 1 Heads is Better than h Heads | Jul 1, 2020 | Language ModelingLanguage Modelling | —Unverified | 0 |
| GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding | Jun 30, 2020 | Machine TranslationMixture-of-Experts | CodeCode Available | 0 |
| Task-Agnostic Online Reinforcement Learning with an Infinite Mixture of Gaussian Processes | Jun 19, 2020 | Continual LearningDecision Making | CodeCode Available | 1 |
| Model Agnostic Combination for Ensemble Learning | Jun 16, 2020 | Ensemble LearningMixture-of-Experts | —Unverified | 0 |
| An efficient application of Bayesian optimization to an industrial MDO framework for aircraft design | Jun 12, 2020 | Bayesian Optimizationglobal-optimization | —Unverified | 0 |
| Fast Deep Mixtures of Gaussian Process Experts | Jun 11, 2020 | Gaussian ProcessesMixture-of-Experts | —Unverified | 0 |
| Catching Attention with Automatic Pull Quote Selection | May 27, 2020 | ArticlesMixture-of-Experts | CodeCode Available | 0 |
| A Tree Architecture of LSTM Networks for Sequential Regression with Missing Data | May 22, 2020 | Mixture-of-Expertsregression | —Unverified | 0 |
| A Mixture of h-1 Heads is Better than h Heads | May 13, 2020 | Language ModelingLanguage Modelling | —Unverified | 0 |
| Machine learning based digital twin for dynamical systems with multiple time-scales | May 12, 2020 | BIG-bench Machine LearningMixture-of-Experts | —Unverified | 0 |