SOTAVerified

Discovering Weight Initializers with Meta Learning

2021-05-20ICML Workshop AutoML 2021Code Available0· sign in to hype

Dmitry Baranchuk, Artem Babenko

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Deep neural network training largely depends on the choice of initial weight distribution. However, this choice can often be nontrivial. Existing theoretical results for this problem mostly cover simple architectures, e.g., feedforward networks with ReLU activations. The architectures used for practical problems are more complex and often incorporate many overlapping modules, making them challenging for theoretical analysis. Therefore, practitioners have to use heuristic initializers with questionable optimality and stability. In this study, we propose a task-agnostic approach that discovers initializers for specific network architectures and optimizers by learning the initial weight distributions directly through the use of Meta-Learning. In several supervised and unsupervised learning scenarios, we show the advantage of our initializers in terms of both faster convergence and higher model performance.

Tasks

Reproductions