Pushing the bounds of dropout

2018-05-23ICLR 2019Code Available0· sign in to hype

Gábor Melis, Charles Blundell, Tomáš Kočiský, Karl Moritz Hermann, Chris Dyer, Phil Blunsom

Code Available — Be the first to reproduce this paper.

Code

github.com/deepmind/lamb
tf★ 0

Abstract

We show that dropout training is best understood as performing MAP estimation concurrently for a family of conditional models whose objectives are themselves lower bounded by the original dropout objective. This discovery allows us to pick any model from this family after training, which leads to a substantial improvement on regularisation-heavy language modelling. The family includes models that compute a power mean over the sampled dropout masks, and their less stochastic subvariants with tighter and higher lower bounds than the fully stochastic dropout objective. We argue that since the deterministic subvariant's bound is equal to its objective, and the highest amongst these models, the predominant view of it as a good approximation to MC averaging is misleading. Rather, deterministic dropout is the best available approximation to the true objective.

Tasks

Language Modelling

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Penn Treebank (Word Level)	2-layer skip-LSTM + dropout tuning	Test perplexity	55.3	—	Unverified

Pushing the bounds of dropout

Code

Abstract

Tasks

Benchmark Results

Reproductions