Gated Multimodal Units for Information Fusion

2017-02-07Code Available1· sign in to hype

John Arevalo, Thamar Solorio, Manuel Montes-y-Gómez, Fabio A. González

Code Available — Be the first to reproduce this paper.

Code

github.com/johnarevalo/gmu-mmimdb
OfficialIn papernone★ 0
github.com/terenceylchow124/Meme-MultiModal
pytorch★ 12
github.com/IsaacRodgz/multimodal-transformers-movies
pytorch★ 11
github.com/mv96/mm_extraction
tf★ 5
github.com/IsaacRodgz/ConcatBERT
pytorch★ 0
github.com/TashinAhmed/CNN_BERT
pytorch★ 0
github.com/IsaacRodgz/GMU-Baseline
pytorch★ 0
github.com/TashinAhmed/BERT-Research
pytorch★ 0
github.com/IsaacRodgz/mmbt_experiments
pytorch★ 0

Abstract

This paper presents a novel model for multimodal learning based on gated neural networks. The Gated Multimodal Unit (GMU) model is intended to be used as an internal unit in a neural network architecture whose purpose is to find an intermediate representation based on a combination of data from different modalities. The GMU learns to decide how modalities influence the activation of the unit using multiplicative gates. It was evaluated on a multilabel scenario for genre classification of movies using the plot and the poster. The GMU improved the macro f-score performance of single-modality approaches and outperformed other fusion strategies, including mixture of experts models. Along with this work, the MM-IMDb dataset is released which, to the best of our knowledge, is the largest publicly available multimodal dataset for genre prediction on movies.

Tasks

General Classification Genre classification Mixture-of-Experts

Gated Multimodal Units for Information Fusion

Code

Abstract

Tasks

Reproductions