eaVQA: An Experimental Analysis on Visual Question Answering Models

2021-12-01ICON 2021Unverified0· sign in to hype

Souvik Chowdhury, Badal Soni

Unverified — Be the first to reproduce this paper.

Abstract

Visual Question Answering (VQA) has recently become a popular research area. VQA problem lies in the boundary of Computer Vision and Natural Language Processing research domains. In VQA research, the dataset is a very important aspect because of its variety in image types i.e. natural and synthetic and also question answer source i.e. originated from human source or computer-generated question answer. Various details about each dataset is given in this paper, which can help future researchers to a great extent. In this paper, we discussed and compared the experimental performance of Stacked Attention Network Model (SANM) and bidirectional LSTM and MUTAN based fusion models. As per the experimental results, MUTAN accuracy and loss are 29% and 3.5 respectively. SANM model is giving 55% accuracy and a loss of 2.2 whereas VQA model is giving 59% accuracy and 1.9 loss.

Tasks

Question Answering Visual Question Answering Visual Question Answering (VQA)

eaVQA: An Experimental Analysis on Visual Question Answering Models

Abstract

Tasks

Reproductions