Mogrifier LSTM

2019-09-04ICLR 2020Code Available0· sign in to hype

Gábor Melis, Tomáš Kočiský, Phil Blunsom

Code Available — Be the first to reproduce this paper.

Code

github.com/deepmind/lamb
OfficialIn papertf★ 0
github.com/microcoder-py/mogrifier-lstm
tf★ 0
github.com/RMichaelSwan/MogrifierLSTM
pytorch★ 0

Abstract

Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.

Tasks

Language Modelling

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
enwik8	Mogrifier LSTM	Bit per Character (BPC)	1.15	—	Unverified
enwik8	LSTM	Bit per Character (BPC)	1.2	—	Unverified
Hutter Prize	Mogrifier LSTM	Bit per Character (BPC)	1.12	—	Unverified
Hutter Prize	Mogrifier LSTM + dynamic eval	Bit per Character (BPC)	0.99	—	Unverified
Penn Treebank (Character Level)	Mogrifier LSTM	Bit per Character (BPC)	1.12	—	Unverified
Penn Treebank (Character Level)	Mogrifier LSTM + dynamic eval	Bit per Character (BPC)	1.08	—	Unverified
Penn Treebank (Word Level)	Mogrifier LSTM + dynamic eval	Test perplexity	44.9	—	Unverified
WikiText-2	Mogrifier LSTM + dynamic eval	Test perplexity	38.6	—	Unverified
WikiText-2	Mogrifier LSTM	Test perplexity	55.1	—	Unverified

Mogrifier LSTM

Code

Abstract

Tasks

Benchmark Results

Reproductions