SOTAVerified

An Analysis of Neural Language Modeling at Multiple Scales

2018-03-22Code Available0· sign in to hype

Stephen Merity, Nitish Shirish Keskar, Richard Socher

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level language models based on LSTMs and QRNNs and extend them to both larger vocabularies as well as character-level granularity. When properly tuned, LSTMs and QRNNs achieve state-of-the-art results on character-level (Penn Treebank, enwik8) and word-level (WikiText-103) datasets, respectively. Results are obtained in only 12 hours (WikiText-103) to 2 days (enwik8) using a single modern GPU.

Tasks

Benchmark Results

DatasetModelMetricClaimedVerifiedStatus
enwik8AWD-LSTM (3 layers)Bit per Character (BPC)1.23Unverified
Hutter Prize3-layer AWD-LSTMBit per Character (BPC)1.23Unverified
Penn Treebank (Character Level)3-layer AWD-LSTMBit per Character (BPC)1.18Unverified
Penn Treebank (Character Level)6-layer QRNNBit per Character (BPC)1.19Unverified
WikiText-1034 layer QRNNTest perplexity33Unverified

Reproductions