Loss Decomposition for Fast Learning in Large Output Spaces

2018-07-01ICML 2018Unverified0· sign in to hype

Ian En-Hsu Yen, Satyen Kale, Felix Yu, Daniel Holtmann-Rice, Sanjiv Kumar, Pradeep Ravikumar

Unverified — Be the first to reproduce this paper.

Abstract

For problems with large output spaces, evaluation of the loss function and its gradient are expensive, typically taking linear time in the size of the output space. Recently, methods have been developed to speed up learning via efficient data structures for Nearest-Neighbor Search (NNS) or Maximum Inner-Product Search (MIPS). However, the performance of such data structures typically degrades in high dimensions. In this work, we propose a novel technique to reduce the intractable high dimensional search problem to several much more tractable lower dimensional ones via dual decomposition of the loss function. At the same time, we demonstrate guaranteed convergence to the original loss via a greedy message passing procedure. In our experiments on multiclass and multilabel classification with hundreds of thousands of classes, as well as training skip-gram word embeddings with a vocabulary size of half a million, our technique consistently improves the accuracy of search-based gradient approximation methods and outperforms sampling-based gradient approximation methods by a large margin.

Tasks

Word Embeddings

Loss Decomposition for Fast Learning in Large Output Spaces

Abstract

Tasks

Reproductions