MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

2020-09-18EMNLP 2020Code Available1· sign in to hype

Tejas Gokhale, Pratyay Banerjee, Chitta Baral, Yezhou Yang

Code Available — Be the first to reproduce this paper.

Code

github.com/tejasG53/vqa_mutant
Officialpytorch★ 13
github.com/tejas-gokhale/vqa_mutant
pytorch★ 13

Abstract

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of train and test answer distributions. MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a 10.57\% improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering.

Tasks

Out-of-Distribution Generalization Question Answering Visual Question Answering Visual Question Answering (VQA)

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

Code

Abstract

Tasks

Reproductions