Lookahead Optimizer: k steps forward, 1 step back

2019-07-19NeurIPS 2019Code Available1· sign in to hype

Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba

Code Available — Be the first to reproduce this paper.

Code

github.com/michaelrzhang/lookahead
OfficialIn paperpytorch★ 0
github.com/alphadl/lookahead.pytorch
pytorch★ 338
github.com/HamadYA/GhostFaceNets
tf★ 269
github.com/chizhu/BDC2019
tf★ 43
github.com/kpe/params-flow
tf★ 18
github.com/nsarang/lookahead_keras
tf★ 1
github.com/COMP6248-Reproducability-Challenge/LookaheadOptimizer
pytorch★ 0
github.com/SdahlSean/RangerOptimizerTensorflow
tf★ 0
github.com/allen108108/Model-Optimizer_Implementation
none★ 0
github.com/nachiket273/lookahead_pytorch
pytorch★ 0

Abstract

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

Tasks

Image Classification Machine Translation Stochastic Optimization Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
CIFAR-10 ResNet-18 - 200 Epochs	ADAM	Accuracy	94.84	—	Unverified
CIFAR-10 ResNet-18 - 200 Epochs	Lookahead	Accuracy	95.27	—	Unverified
CIFAR-10 ResNet-18 - 200 Epochs	SGD	Accuracy	95.23	—	Unverified
ImageNet ResNet-50 - 50 Epochs	Lookahead	Top 1 Accuracy	75.13	—	Unverified
ImageNet ResNet-50 - 50 Epochs	SGD	Top 5 Accuracy	92.15	—	Unverified
ImageNet ResNet-50 - 60 Epochs	Lookahead	Top 1 Accuracy	75.49	—	Unverified
ImageNet ResNet-50 - 60 Epochs	SGD	Top 1 Accuracy	75.15	—	Unverified

Lookahead Optimizer: k steps forward, 1 step back

Code

Abstract

Tasks

Benchmark Results

Reproductions