Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization

2016-12-01COLING 2016Unverified0· sign in to hype

Hajime Senuma, Akiko Aizawa

Unverified — Be the first to reproduce this paper.

Abstract

The recent proliferation of smart devices necessitates methods to learn small-sized models. This paper demonstrates that if there are m features in total but only n = o(m) features are required to distinguish examples, with ( m) training examples and reasonable settings, it is possible to obtain a good model in a succinct representation using n _2 mn + o(m) bits, by using a pipeline of existing compression methods: L1-regularized logistic regression, feature hashing, Elias--Fano indices, and randomized quantization. An experiment shows that a noun phrase chunking task for which an existing library requires 27 megabytes can be compressed to less than 13 kilobytes without notable loss of accuracy.

Tasks

Chunking Quantization regression

Learning Succinct Models: Pipelined Compression with L1-Regularization, Hashing, Elias-Fano Indices, and Quantization

Abstract

Tasks

Reproductions