A Meta-transfer Learning framework for Visually Grounded Compositional Concept Learning
Anonymous
Unverified — Be the first to reproduce this paper.
ReproduceAbstract
Humans acquire language in a compositional and grounded manner.They can describe their perceptual world using novel compositions from already learnt elementary concepts. However, recent research shows that modern neural networks lack such compositional generalization ability. To address this challenge, in this paper, we propose MetaVL, a meta-transfer learning framework to train transformer-based vision-and-language (V\&L) models using optimization-based meta-learning method and episodic training.We carefully created two datasets based on MSCOCO and Flicker30K to specifically target novel compositional concept learning. Our empirical results have shown that MetaVL outperforms baseline models in both datasets. Moreover, MetaVL has demonstrated higher sample efficiency compared to supervised learning, especially under the few-shot setting.