On the State of the Art of Evaluation in Neural Language Models

2017-07-18ICLR 2018Code Available0· sign in to hype

Gábor Melis, Chris Dyer, Phil Blunsom

Code Available — Be the first to reproduce this paper.

Code

github.com/deepmind/lamb
tf★ 0

Abstract

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited computational resources, which represent uncontrolled sources of experimental variation. We reevaluate several popular architectures and regularisation methods with large-scale automatic black-box hyperparameter tuning and arrive at the somewhat surprising conclusion that standard LSTM architectures, when properly regularised, outperform more recent models. We establish a new state of the art on the Penn Treebank and Wikitext-2 corpora, as well as strong baselines on the Hutter Prize dataset.

Tasks

Language Modelling

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
WikiText-2	Melis et al. (2017) - 1-layer LSTM (tied)	Test perplexity	65.9	—	Unverified

On the State of the Art of Evaluation in Neural Language Models

Code

Abstract

Tasks

Benchmark Results

Reproductions