Mogrifier LSTM
Gábor Melis, Tomáš Kočiský, Phil Blunsom
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/deepmind/lambOfficialIn papertf★ 0
- github.com/microcoder-py/mogrifier-lstmtf★ 0
- github.com/RMichaelSwan/MogrifierLSTMpytorch★ 0
Abstract
Many advances in Natural Language Processing have been based upon more expressive models for how inputs interact with the context in which they occur. Recurrent networks, which have enjoyed a modicum of success, still lack the generalization and systematicity ultimately required for modelling language. In this work, we propose an extension to the venerable Long Short-Term Memory in the form of mutual gating of the current input and the previous output. This mechanism affords the modelling of a richer space of interactions between inputs and their context. Equivalently, our model can be viewed as making the transition function given by the LSTM context-dependent. Experiments demonstrate markedly improved generalization on language modelling in the range of 3-4 perplexity points on Penn Treebank and Wikitext-2, and 0.01-0.05 bpc on four character-based datasets. We establish a new state of the art on all datasets with the exception of Enwik8, where we close a large gap between the LSTM and Transformer models.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| enwik8 | Mogrifier LSTM | Bit per Character (BPC) | 1.15 | — | Unverified |
| enwik8 | LSTM | Bit per Character (BPC) | 1.2 | — | Unverified |
| Hutter Prize | Mogrifier LSTM | Bit per Character (BPC) | 1.12 | — | Unverified |
| Hutter Prize | Mogrifier LSTM + dynamic eval | Bit per Character (BPC) | 0.99 | — | Unverified |
| Penn Treebank (Character Level) | Mogrifier LSTM | Bit per Character (BPC) | 1.12 | — | Unverified |
| Penn Treebank (Character Level) | Mogrifier LSTM + dynamic eval | Bit per Character (BPC) | 1.08 | — | Unverified |
| Penn Treebank (Word Level) | Mogrifier LSTM + dynamic eval | Test perplexity | 44.9 | — | Unverified |
| WikiText-2 | Mogrifier LSTM + dynamic eval | Test perplexity | 38.6 | — | Unverified |
| WikiText-2 | Mogrifier LSTM | Test perplexity | 55.1 | — | Unverified |