Regularizing and Optimizing LSTM Language Models

2017-08-07ICLR 2018Code Available1· sign in to hype

Stephen Merity, Nitish Shirish Keskar, Richard Socher

Code Available — Be the first to reproduce this paper.

Code

github.com/salesforce/awd-lstm-lm
Officialpytorch★ 0
github.com/S-Abdelnabi/awt
pytorch★ 54
github.com/alexandra-chron/wassa-2018
pytorch★ 32
github.com/mamamot/Russian-ULMFit
pytorch★ 27
github.com/ahmetumutdurmus/awd-lstm
pytorch★ 12
github.com/chris-tng/semi-supervised-nlp
pytorch★ 12
github.com/BenjiKCF/AWD-LSTM-sentiment-classifier
pytorch★ 3
github.com/jkkummerfeld/emnlp20lm
pytorch★ 3
github.com/Amir-Hofo/Language_Modeling_with_LSTM_models
none★ 2
github.com/AtheMathmo/lookahead-lstm
pytorch★ 0

Abstract

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this paper, we consider the specific problem of word-level language modeling and investigate strategies for regularizing and optimizing LSTM-based models. We propose the weight-dropped LSTM which uses DropConnect on hidden-to-hidden weights as a form of recurrent regularization. Further, we introduce NT-ASGD, a variant of the averaged stochastic gradient method, wherein the averaging trigger is determined using a non-monotonic condition as opposed to being tuned by the user. Using these and other regularization strategies, we achieve state-of-the-art word level perplexities on two data sets: 57.3 on Penn Treebank and 65.8 on WikiText-2. In exploring the effectiveness of a neural cache in conjunction with our proposed model, we achieve an even lower state-of-the-art perplexity of 52.8 on Penn Treebank and 52.0 on WikiText-2.

Tasks

Image Classification Language Modeling Language Modelling Translation

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Penn Treebank (Word Level)	AWD-LSTM + continuous cache pointer	Test perplexity	52.8	—	Unverified
Penn Treebank (Word Level)	AWD-LSTM	Test perplexity	57.3	—	Unverified
WikiText-2	AWD-LSTM + continuous cache pointer	Test perplexity	52	—	Unverified
WikiText-2	AWD-LSTM	Test perplexity	65.8	—	Unverified

Regularizing and Optimizing LSTM Language Models

Code

Abstract

Tasks

Benchmark Results

Reproductions