Efficient Sequence Learning with Group Recurrent Networks

2018-06-01NAACL 2018Unverified0· sign in to hype

Fei Gao, Lijun Wu, Li Zhao, Tao Qin, Xue-Qi Cheng, Tie-Yan Liu

Unverified — Be the first to reproduce this paper.

Abstract

Recurrent neural networks have achieved state-of-the-art results in many artificial intelligence tasks, such as language modeling, neural machine translation, speech recognition and so on. One of the key factors to these successes is big models. However, training such big models usually takes days or even weeks of time even if using tens of GPU cards. In this paper, we propose an efficient architecture to improve the efficiency of such RNN model training, which adopts the group strategy for recurrent layers, while exploiting the representation rearrangement strategy between layers as well as time steps. To demonstrate the advantages of our models, we conduct experiments on several datasets and tasks. The results show that our architecture achieves comparable or better accuracy comparing with baselines, with a much smaller number of parameters and at a much lower computational cost.

Tasks

GPU Language Modeling Language Modelling Machine Translation speech-recognition Speech Recognition Translation

Efficient Sequence Learning with Group Recurrent Networks

Abstract

Tasks

Reproductions