An Analysis of Neural Language Modeling at Multiple Scales

2018-03-22Code Available0· sign in to hype

Stephen Merity, Nitish Shirish Keskar, Richard Socher

Code Available — Be the first to reproduce this paper.

Code

github.com/salesforce/awd-lstm-lm
OfficialIn paperpytorch★ 0
github.com/mnhng/hier-char-emb
pytorch★ 0
github.com/Han-JD/GRU-D
pytorch★ 0
github.com/AtheMathmo/lookahead-lstm
pytorch★ 0
github.com/jb33k/awd-lstm-lm-ThinkNet
pytorch★ 0
github.com/SachinIchake/KALM
pytorch★ 0
github.com/llppff/ptb-lstmorqrnn-pytorch
pytorch★ 0
github.com/ari-holtzman/genlm
pytorch★ 0
github.com/arvieFrydenlund/awd-lstm-lm
pytorch★ 0
github.com/philippwirth/awd-lstm-test
pytorch★ 0

Abstract

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Tasks

GPU Language Modeling Language Modelling

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
enwik8	AWD-LSTM (3 layers)	Bit per Character (BPC)	1.23	—	Unverified
Hutter Prize	3-layer AWD-LSTM	Bit per Character (BPC)	1.23	—	Unverified
Penn Treebank (Character Level)	3-layer AWD-LSTM	Bit per Character (BPC)	1.18	—	Unverified
Penn Treebank (Character Level)	6-layer QRNN	Bit per Character (BPC)	1.19	—	Unverified
WikiText-103	4 layer QRNN	Test perplexity	33	—	Unverified

An Analysis of Neural Language Modeling at Multiple Scales

Code

Abstract

Tasks

Benchmark Results

Reproductions