An Analysis of Neural Language Modeling at Multiple Scales
2018-03-22Code Available0· sign in to hype
Stephen Merity, Nitish Shirish Keskar, Richard Socher
Code Available — Be the first to reproduce this paper.
ReproduceCode
- github.com/salesforce/awd-lstm-lmOfficialIn paperpytorch★ 0
- github.com/mnhng/hier-char-embpytorch★ 0
- github.com/Han-JD/GRU-Dpytorch★ 0
- github.com/AtheMathmo/lookahead-lstmpytorch★ 0
- github.com/jb33k/awd-lstm-lm-ThinkNetpytorch★ 0
- github.com/SachinIchake/KALMpytorch★ 0
- github.com/llppff/ptb-lstmorqrnn-pytorchpytorch★ 0
- github.com/ari-holtzman/genlmpytorch★ 0
- github.com/arvieFrydenlund/awd-lstm-lmpytorch★ 0
- github.com/philippwirth/awd-lstm-testpytorch★ 0
Abstract
Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.
Tasks
Benchmark Results
| Dataset | Model | Metric | Claimed | Verified | Status |
|---|---|---|---|---|---|
| enwik8 | AWD-LSTM (3 layers) | Bit per Character (BPC) | 1.23 | — | Unverified |
| Hutter Prize | 3-layer AWD-LSTM | Bit per Character (BPC) | 1.23 | — | Unverified |
| Penn Treebank (Character Level) | 3-layer AWD-LSTM | Bit per Character (BPC) | 1.18 | — | Unverified |
| Penn Treebank (Character Level) | 6-layer QRNN | Bit per Character (BPC) | 1.19 | — | Unverified |
| WikiText-103 | 4 layer QRNN | Test perplexity | 33 | — | Unverified |