SOTAVerified

Grounding Semantic Roles in Images

2018-10-01EMNLP 2018Unverified0· sign in to hype

Carina Silberer, Manfred Pinkal

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

We address the task of visual semantic role labeling (vSRL), the identification of the participants of a situation or event in a visual scene, and their labeling with their semantic relations to the event or situation. We render candidate participants as image regions of objects, and train a model which learns to ground roles in the regions which depict the corresponding participant. Experimental results demonstrate that we can train a vSRL model without reliance on prohibitive image-based role annotations, by utilizing noisy data which we extract automatically from image captions using a linguistic SRL system. Furthermore, our model induces frame---semantic visual representations, and their comparison to previous work on supervised visual verb sense disambiguation yields overall better results.

Tasks

Reproductions