Generating Diverse Translation by Manipulating Multi-Head Attention
Zewei Sun, Shu-Jian Huang, Hao-Ran Wei, Xin-yu Dai, Jia-Jun Chen
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Transformer model has been widely used on machine translation tasks and obtained state-of-the-art results. In this paper, we report an interesting phenomenon in its encoder-decoder multi-head attention: different attention heads of the final decoder layer align to different word translation candidates. We empirically verify this discovery and propose a method to generate diverse translations by manipulating heads. Furthermore, we make use of these diverse translations with the back-translation technique for better data augmentation. Experiment results show that our method generates diverse translations without severe drop in translation quality. Experiments also show that back-translation with these diverse translations could bring significant improvement on performance on translation tasks. An auxiliary experiment of conversation response generation task proves the effect of diversity as well.