LSH Softmax: Sub-Linear Learning and Inference of the Softmax Layer in Deep Architectures

2018-01-01ICLR 2018Unverified0· sign in to hype

Daniel Levy, Danlu Chan, Stefano Ermon

Unverified — Be the first to reproduce this paper.

Abstract

Log-linear models models are widely used in machine learning, and in particular are ubiquitous in deep learning architectures in the form of the softmax. While exact inference and learning of these requires linear time, it can be done approximately in sub-linear time with strong concentrations guarantees. In this work, we present LSH Softmax, a method to perform sub-linear learning and inference of the softmax layer in the deep learning setting. Our method relies on the popular Locality-Sensitive Hashing to build a well-concentrated gradient estimator, using nearest neighbors and uniform samples. We also present an inference scheme in sub-linear time for LSH Softmax using the Gumbel distribution. On language modeling, we show that Recurrent Neural Networks trained with LSH Softmax perform on-par with computing the exact softmax while requiring sub-linear computations.

Tasks

Deep Learning Language Modeling Language Modelling

LSH Softmax: Sub-Linear Learning and Inference of the Softmax Layer in Deep Architectures

Abstract

Tasks

Reproductions