SOTAVerified

Similarity Maps for Self-Training Weakly-Supervised Phrase Grounding

2023-01-01CVPR 2023Code Available0· sign in to hype

Tal Shaharabany, Lior Wolf

Code Available — Be the first to reproduce this paper.

Reproduce

Code

Abstract

A phrase grounding model receives an input image and a text phrase and outputs a suitable localization map. We present an effective way to refine a phrase ground model by considering self-similarity maps extracted from the latent representation of the model's image encoder. Our main insights are that these maps resemble localization maps and that by combining such maps, one can obtain useful pseudo-labels for performing self-training. Our results surpass, by a large margin, the state-of-the-art in weakly supervised phrase grounding. A similar gap in performance is obtained for a recently proposed downstream task called WWbL, in which the input image is given without any text. Our code is available as supplementary.

Tasks

Reproductions