SOTAVerified

MM-GATBT: Enriching Multimodal Representation Using Graph Attention Network

2022-07-01NAACL (ACL) 2022Code Available0· sign in to hype

Seung Byum Seo, Hyoungwook Nam, Payam Delgosha

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

While there have been advances in Natural Language Processing (NLP), their success is mainly gained by applying a self-attention mechanism into single or multi-modalities. While this approach has brought significant improvements in multiple downstream tasks, it fails to capture the interaction between different entities. Therefore, we propose MM-GATBT, a multimodal graph representation learning model that captures not only the relational semantics within one modality but also the interactions between different modalities. Specifically, the proposed method constructs image-based node embedding which contains relational semantics of entities. Our empirical results show that MM-GATBT achieves state-of-the-art results among all published papers on the MM-IMDb dataset.

Tasks

Reproductions