SOTAVerified

Leveraging Human Attention in Novel Object Captioning

2021-08-19Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence 2021Code Available0· sign in to hype

Xianyu Chen, Ming Jiang, Qi Zhao

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

Image captioning models depend on training with paired image-text corpora, which poses various challenges in describing images containing novel objects absent from the training data. While previous novel object captioning methods rely on external image taggers or object detectors to describe novel objects, we present the Attentionbased Novel Object Captioner (ANOC) that complements novel object captioners with human attention features that characterize generally important information independent of tasks. It introduces a gating mechanism that adaptively incorporates human attention with self-learned machine attention, with a Constrained Self-Critical Sequence Training method to address the exposure bias while maintaining constraints of novel object descriptions. Extensive experiments conducted on the nocaps and Held-Out COCO datasets demonstrate that our method considerably outperforms the state-of-the-art novel object captioners. Our source code is available at https://github.com/chenxy99/ANOC.

Tasks

Reproductions