SOTAVerified

Double Attention-based Multimodal Neural Machine Translation with Semantic Image Regions

2020-11-03EAMT 2020Code Available0· sign in to hype

YuTing Zhao, Mamoru Komachi, Tomoyuki Kajiwara, Chenhui Chu

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Existing studies on multimodal neural machine translation (MNMT) have mainly focused on the effect of combining visual and textual modalities to improve translations. However, it has been suggested that the visual modality is only marginally beneficial. Conventional visual attention mechanisms have been used to select the visual features from equally-sized grids generated by convolutional neural networks (CNNs), and may have had modest effects on aligning the visual concepts associated with textual objects, because the grid visual features do not capture semantic information. In contrast, we propose the application of semantic image regions for MNMT by integrating visual and textual features using two individual attention mechanisms (double attention). We conducted experiments on the Multi30k dataset and achieved an improvement of 0.5 and 0.9 BLEU points for English-German and English-French translation tasks, compared with the MNMT with grid visual features. We also demonstrated concrete improvements on translation performance benefited from semantic image regions.

Tasks

Reproductions