SOTAVerified

A Corpus for Visual Question Answering Annotated with Frame Semantic Information

2020-05-01LREC 2020Unverified0· sign in to hype

Mehrdad Alizadeh, Barbara Di Eugenio

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

Visual Question Answering (VQA) has been widely explored as a computer vision problem, however enhancing VQA systems with linguistic information is necessary for tackling the complexity of the task. The language understanding part can play a major role especially for questions asking about events or actions expressed via verbs. We hypothesize that if the question focuses on events described by verbs, then the model should be aware of or trained with verb semantics, as expressed via semantic role labels, argument types, and/or frame elements. Unfortunately, no VQA dataset exists that includes verb semantic information. We created a new VQA dataset annotated with verb semantic information called imSituVQA. imSituVQA is built by taking advantage of the imSitu dataset annotations. The imSitu dataset consists of images manually labeled with semantic frame elements, mostly taken from FrameNet.

Tasks

Reproductions