E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information

2021-11-16ACL ARR November 2021Unverified0· sign in to hype

Anonymous

Unverified — Be the first to reproduce this paper.

Abstract

With multimodal tasks increasingly getting popular in recent years, datasets with large scale and reliable authenticity are in urgent demand. Therefore, we present an e-commercial multimodal advertising dataset, E-MMAD, which contains 120 thousand valid data elaborately picked out from 1.3 million real product examples in both Chinese and English. Noticeably, it is one of the largest video captioning datasets in this field, in which each example has its product video (around 30 seconds), title, caption and structured information table that is observed to play a vital role in practice. We also introduce a fresh task for vision-language research based on E-MMAD: e-commercial multimodal advertising generation, which requires to use aforementioned product multimodal information to generate textual advertisement. Accordingly, we propose a baseline method on the strength of structured information reasoning to solve the demand in reality on this dataset.

Tasks

Caption Generation valid Video Captioning

E-MMAD: Multimodal Advertising Caption Generation Based on Structured Information

Abstract

Tasks

Reproductions