SOTAVerified

Language Controls More Than Top-Down Attention: Modulating Bottom-Up Visual Processing with Referring Expressions

2021-01-01Unverified0· sign in to hype

Ozan Arkan Can, Ilker Kesen, Deniz Yuret

Unverified — Be the first to reproduce this paper.

Reproduce

Abstract

How to best integrate linguistic and perceptual processing in multimodal tasks is an important open problem. In this work we argue that the common technique of using language to direct visual attention over high-level visual features may not be optimal. Using language throughout the bottom-up visual pathway, going from pixels to high-level features, may be necessary. Our experiments on several English referring expression datasets show significant improvements when language is used to control the filters for bottom-up visual processing in addition to top-down attention.

Tasks

Reproductions