On Nondeterminism and Instability in Optimizing Neural Networks
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Nondeterminism in optimization causes uncertainty when improving neural networks, with small gains in performance difficult to discern from run-to-run variability. While this uncertainty can be reduced by training multiple copies of a model with different random seeds, doing so can be time-consuming, costly, and is energy-inefficient. Despite this, little attention has been paid towards establishing an understanding of this problem. In this work, we establish an experimental protocol for understanding the effect of optimization nondeterminism on model diversity, which allows us to study the independent effects of nondeterminism sources, including random parameter initialization, data augmentation, data shuffling, and even low-level nondeterminism in popular accelerator libraries like cuDNN. Surprisingly, we find that changes to each source of nondeterminism all have similar effects on measures of model diversity. To explain this intriguing fact, we examine and identify the instability of model training, when taken as an end-to-end procedure, as the key determinant. We show that even one-bit changes in initial model parameters result in models that converge to vastly different values, and even place a lower bound of 10^7 on the condition number of model training for example architectures considered.