SOTAVerified

From Information Bottleneck To Activation Norm Penalty

2018-01-01ICLR 2018Unverified0· sign in to hype

Allen Nie, Mihir Mongia, James Zou

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Many regularization methods have been proposed to prevent overfitting in neural networks. Recently, a regularization method has been proposed to optimize the variational lower bound of the Information Bottleneck Lagrangian. However, this method cannot be generalized to regular neural network architectures. We present the activation norm penalty that is derived from the information bottleneck principle and is theoretically grounded in a variation dropout framework. Unlike in previous literature, it can be applied to any general neural network. We demonstrate that this penalty can give consistent improvements to different state of the art architectures both in language modeling and image classification. We present analyses on the properties of this penalty and compare it to other methods that also reduce mutual information.

Tasks

Reproductions