Improving Neural Language Models with Weight Norm Initialization and Regularization

2018-10-01WS 2018Unverified0· sign in to hype

Christian Herold, Yingbo Gao, Hermann Ney

Unverified — Be the first to reproduce this paper.

Abstract

Embedding and projection matrices are commonly used in neural language models (NLM) as well as in other sequence processing networks that operate on large vocabularies. We examine such matrices in fine-tuned language models and observe that a NLM learns word vectors whose norms are related to the word frequencies. We show that by initializing the weight norms with scaled log word counts, together with other techniques, lower perplexities can be obtained in early epochs of training. We also introduce a weight norm regularization loss term, whose hyperparameters are tuned via a grid search. With this method, we are able to significantly improve perplexities on two word-level language modeling tasks (without dynamic evaluation): from 54.44 to 53.16 on Penn Treebank (PTB) and from 61.45 to 60.13 on WikiText-2 (WT2).

Tasks

Automatic Speech Recognition (ASR)Language Modeling Language Modelling Machine Translation Speech Recognition

Improving Neural Language Models with Weight Norm Initialization and Regularization

Abstract

Tasks

Reproductions