Dense Recurrent Neural Network with Attention Gate

2018-01-01ICLR 2018Unverified0· sign in to hype

Yong-Ho Yoo, Kook Han, Sanghyun Cho, Kyoung-Chul Koh, Jong-Hwan Kim

Unverified — Be the first to reproduce this paper.

Abstract

We propose the dense RNN, which has the fully connections from each hidden state to multiple preceding hidden states of all layers directly. As the density of the connection increases, the number of paths through which the gradient flows can be increased. It increases the magnitude of gradients, which help to prevent the vanishing gradient problem in time. Larger gradients, however, can also cause exploding gradient problem. To complement the trade-off between two problems, we propose an attention gate, which controls the amounts of gradient flows. We describe the relation between the attention gate and the gradient flows by approximation. The experiment on the language modeling using Penn Treebank corpus shows dense connections with the attention gate improve the model’s performance.

Tasks

Language Modeling Language Modelling

Dense Recurrent Neural Network with Attention Gate

Abstract

Tasks

Reproductions