Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

2017-11-10ICLR 2018Code Available0· sign in to hype

Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen

Code Available — Be the first to reproduce this paper.

Code

github.com/zihangdai/mos
OfficialIn paperpytorch★ 0
github.com/yfreedomliTHU/mos-pytorch1.1
pytorch★ 3
github.com/cstorm125/thai2fit
pytorch★ 0
github.com/nunezpaul/MNIST
tf★ 0
github.com/zhangyaoyuan/GAN-Simplification
tf★ 0
github.com/nkcr/overlap-ml
pytorch★ 0
github.com/omerlux/Recurrent_Neural_Network_-_Part_2
tf★ 0
github.com/tdmeeste/SparseSeqModels
pytorch★ 0
github.com/omerlux/NLP-PTB
pytorch★ 0

Abstract

We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck. Given that natural language is highly context-dependent, this further implies that in practice Softmax with distributed word embeddings does not have enough capacity to model natural language. We propose a simple and effective method to address this issue, and improve the state-of-the-art perplexities on Penn Treebank and WikiText-2 to 47.69 and 40.68 respectively. The proposed method also excels on the large-scale 1B Word dataset, outperforming the baseline by over 5.6 points in perplexity.

Tasks

Language Modeling Language Modelling Vocal Bursts Intensity Prediction Word Embeddings

Benchmark Results

Dataset	Model	Metric	Claimed	Verified	Status
Penn Treebank (Word Level)	AWD-LSTM-MoS + dynamic eval	Test perplexity	47.69	—	Unverified
Penn Treebank (Word Level)	AWD-LSTM-MoS	Test perplexity	54.44	—	Unverified
WikiText-2	AWD-LSTM-MoS + dynamic eval	Test perplexity	40.68	—	Unverified
WikiText-2	AWD-LSTM-MoS	Test perplexity	61.45	—	Unverified

Breaking the Softmax Bottleneck: A High-Rank RNN Language Model

Code

Abstract

Tasks

Benchmark Results

Reproductions